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The Indo-European Language Family 


Modern languages like English, Spanish, Russian and Hindi as well as ancient 
languages like Greek, Latin and Sanskrit all belong to the Indo-European 
language family, which means that they all descend from a common ancestor. 
But how, more precisely, are the Indo-European languages related to each 
other? This book brings together pioneering research from a team of inter- 
national scholars to address this fundamental question. It provides an intro- 
duction to linguistic subgrouping and offers comprehensive, systematic and 
up-to-date analyses of the ten main branches of the Indo-European language 
family: Anatolian, Tocharian, Italic, Celtic, Germanic, Greek, Armenian, 
Albanian, Indo-Iranian and Balto-Slavic. By highlighting that these branches 
are saliently different from each other, yet at the same time display striking 
similarities, the book investigates the early diversification of the Indo- 
European language family, spoken today by half the world's population. 
This title is also available as open access on Cambridge Core. 
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1 Introduction 


Thomas Olander 


1.1 Background 


The study of the genealogical relationship between the Indo-European lan- 
guages has been the object of research ever since August Schleicher's famous 
Stammbaum representation of the then-known subgroups, or branches (1861: 
7; see also 1853: 787). Throughout most of the twentieth century, this topic 
played a less prominent role in Indo-European studies, but the last few decades 
have witnessed a surge of interest in the internal structure ofthe Indo-European 
language family as well as other language families. 

From a methodological point of view, the renewed interest in linguistic 
phylogenetics, or “phylolinguistics”, came mainly from two sides, rather 
different in their choice of methods and data, yet both based on computational 
approaches. A group of researchers led by Don Ringe applied algorithms based 
on weighted maximum compatibility to a data set consisting of phonological 
and morphological characters and a list of basic vocabulary items from 
a selection of twenty-four Indo-European languages representing the individ- 
ual subgroups (Ringe, Warnow & Taylor 2002; Nakhleh, Ringe & Warnow 
2005). Another group, headed by Russell D. Gray, applied Bayesian methods to 
data sets exclusively consisting of lists of basic vocabulary (for the Indo- 
European language family, see e.g. Gray & Atkinson 2003; Bouckaert et al. 
2012); the same methods and data were used in Chang et al. 2015. 

Within Indo-European studies, the increasing interest in linguistic phyloge- 
netics has mainly taken its point of departure in traditional methodology, where 
subgroups are identified on the basis of significant shared innovations across 
related languages. It seems likely that specialists have become more interested 
in the branching structure of the family tree as a result, at least partly, of the 
growing acceptance of the Anatolian subgroup as a sister to all the remaining 


This chapter was written in connection with the research projects Connecting the dots: 
Reconfiguring the Indo-European family tree (2019—23), financed by the Independent Research 
Fund Denmark, and LAMP: Languages and myths of prehistory (2020—5), financed by 
Riksbankens Jubileumsfond. I am grateful to Simon Poulsen for reading and commenting on 
a draft version of the chapter. 
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2 Thomas Olander 


Indo-European languages (see e.g. Kloekhorst 2008: 7-11; Kloekhorst & 
Pronk 2019; Oettinger 2014; but cf. the more sceptical stance by Melchert in 
press), which highlights the importance of the structure of the family tree for 
the purposes of reconstruction. 

This book has grown out ofa workshop held in Copenhagen in February 2017, 
"The Indo-European Family Tree", where invited speakers discussed meth- 
odological issues and the phylogenetic relations of each of the main Indo- 
European subgroups. Some of the chapters of this book have been authored 
by participants in that workshop, while others have been written by authors 
invited to contribute to the book project. 

The Copenhagen workshop was organised within the framework of the 
research project The homeland: In the footprints of the early Indo- 
Europeans at the University of Copenhagen (2015-18, financed by the 
Carlsberg Foundation). The Homeland project was concerned with the 
location in time and space of the speakers of Proto-Indo-European and 
the early spread of the Indo-European language family throughout Europe 
and western Asia. Since the nodes of a linguistic family tree to a certain 
extent historically represent the geographical separation of the speakers, it 
is essential, when attempting to correlate prehistoric languages with 
material culture, to have a good understanding of the order of separation 
of the daughter languages from their common ancestor. Thus, the so- 
called Indo-European homeland problem and the problem of the structure 
of the Indo-European family tree are closely intertwined. Indeed, studies 
of linguistic phylogenetics are very often also concerned with the geog- 
raphy and time depth of the nodes in the tree, even if the methodologies 
involved are very different (for Indo-European see e.g. Nakhleh, Ringe & 
Warnow 2005 and Bouckaert et al. 2012). 

In its design and structure this book is rooted in the traditional meth- 
odology of linguistic subgrouping. This is not only because of the 
background to how the book was conceived. Over the last couple of 
decades, computer-assisted approaches have, in my view, received more 
attention than can be justified by the results they have produced. In some 
circles, especially within non-linguistic disciplines and among a broader 
audience (as exemplified by the media coverage of Bouckaert et al. 
2012), computer-assisted approaches seem to be more highly regarded 
than traditional studies. 

The impact of publications based on computer-assisted approaches has 
been very limited within the field of Indo-European studies itself, although 
the results achieved by the Ringe group have been somewhat successful 
(see Clackson 2015: 5). Interestingly, what we see is not a large-scale 
rejection of the findings of computer-assisted approaches by traditional 
Indo-European linguists. The findings are, in most cases, simply ignored, 
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probably due to a combination of factors, including the fact that computa- 
tional phylogenetic studies are difficult to evaluate for non-computational 
linguists. This is because the methods employed are very different from 
traditional methods in a number of ways. Firstly, the main focus of com- 
putational studies is often on the methodology and the results, rather than 
on the actual data, which are often full of errors. Secondly, computational 
studies are often written in a very technical language. And thirdly, the 
results are not thought to be of any actual value anyway as they are often 
based on material that is not considered to be particularly significant, while 
the most relevant material is ignored. 

Thus, in some sense, this book may be seen as a traditionalist reaction 
to modern computer-assisted approaches to linguistic Indo-European phy- 
logenetics. This does not mean, however, that the contributors to the book 
in any way have ignored the fact that such approaches may be of great 
benefit to linguistic phylogenetics in general or to Indo-European studies 
in particular; see the chapters by Clackson (Chapter 2), Piwowarczyk 
(Chapter 3) and Ringe (Chapter 4). What should be evident from the 
book is that traditional approaches still have a lot to offer, even though 
they require a high degree of specialisation, including a deep understand- 
ing of the comparative method and linguistic reconstruction as well as 
a profound knowledge of the relevant data, constituted by the phonology, 
morphology, syntax and lexicon of a large number of languages and their 
historical development from Proto-Indo-European to their attestation. As 
Is often emphasised, computational methods cannot and should not replace 
traditional historical linguistics but may prove to be a useful supplement 
(Ringe, Warnow & Taylor 2002: 65—6; compare also the very enthusiastic 
remarks on Bayesian linguistic phylogenetics by Greenhill, Heggarty & 
Gray 2021: 246 with the critical position by Ringe in Chapter 4 of this 
book). This book is thus, in some way, an attempt at reinvigorating the 
traditional methodology, which, outside Indo-European studies, seems to 
be losing ground to computationally based analyses. 

In traditional Indo-European linguistics, there are surprisingly few compre- 
hensive studies of the phylogeny of the language family. Two works that were 
influential in their time are Antoine Meillet's Les dialectes indo-européens 
(1908/1922) and Walter Porzig's Die Gliederung des indogermanischen 
Sprachgebiets (1954). Both works are now old and outdated in a number of 
respects, and perhaps more importantly, their primary aim is to analyse the 
relationship between the ancient Indo-European languages with respect to their 
geographical location rather than to uncover the phylogenetic structure of the 
Indo-European family tree. 

Somewhat newer, but still more than half a century old, is Ancient Indo- 
European dialects, edited by Henrik Birnbaum and Jaan Puhvel (1966). While 
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some parts of that book, especially those concerned with methodological 
problems, are still useful, and some of the chapters even have similar titles to 
those found in this book, it does not cover the individual subgroups systemat- 
ically but only highlights some aspects. Like Meillet's and Porzig's books, it is 
also outdated in a number of respects. 

Up-to-date from the point of view of Indo-European linguistics, the 
work by the Ringe team (e.g. Ringe, Warnow & Taylor 2002; Nakhleh, 
Ringe & Warnow 2005) is partly based on the traditional methodology 
in that 1t identifies significant shared innovations; in addition it incorp- 
orates shared basic vocabulary items. In contrast to traditional Indo- 
European linguistics, the Ringe team uses computational methods to 
produce the best family tree based on a weighted algorithm. Since the 
work by the Ringe team has been published in articles and book chap- 
ters, rather than in book-length treatments, it does not offer much in the 
way of extensive qualitative discussion of the evidence provided by the 
individual subgroups. One of the aims of this book is to facilitate this 
kind of discussion. 

It may be worthwhile to ask why the structure of the Indo-European 
family tree attracts so much interest. For specialists it is essential to have 
an idea of the branching structure of the family tree in order to arrive at 
an adequate reconstruction of the Indo-European proto-language and its 
development into the attested Indo-European languages. All language- 
internal aspects of reconstructed Proto-Indo-European — phonology, 
inflectional and derivational morphology, syntax, lexicon — depend on 
the relationship between the individual subgroups. Any linguistic feature — 
say, the phoneme *b, the augment or the word for ‘wheel’ — must be 
viewed in the light of the family tree (Olander 2018). If the feature cannot 
be reconstructed back to Proto-Indo-European itself, it may or may not 
have been present in the proto-language, but the phylogenetic information 
should be included in the evaluation of each feature, along with systemic 
and typological considerations and the evidence of internal reconstruction. 

Other aspects of Indo-European studies are also intimately connected with 
the purely linguistic evidence. For instance, as already mentioned, the 
branching structure is very likely to be related to the geographical spread of 
early Indo-European speech varieties, and the existence of terminology for 
concepts like ‘wheel’ in the proto-language of a given linguistic subgroup is 
crucial for pinpointing the geographical and chronological location of that 
proto-language. Thus, in correlating the Indo-European proto-language and 
the prehistoric spread of Indo-European languages with the archaeological 
record — including the identification of the Indo-European homeland — the 
branching structure of the family tree plays a decisive role. As this question 
has appeal that goes well beyond specialist circles, the branching structure of 
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Figure 1.1 The “neo-traditional” model 


the family tree 1s not only highly significant in the field of Indo-European 
studies but has a great impact on a broader audience as well. 

The following illustrations show some of the models of the Indo-European 
language family that can be found in recent publications (the nodes are named 
according to the suggestion in Olander 2019a). First, though rarely made 
explicit, the tree underlying much work in Indo-European studies is the “neo- 
traditional model”, where the Anatolian subgroup separates first, whereas the 
relationship between the remaining subgroups is undetermined, de facto result- 
ing in a non-hierarchical subtree for the non-Anatolian part of the family; see 
Figure 1.1. 

A radically different structure is assumed by the Ringe group. The tree is 
binary-branching, with the subgroups leaving gradually; see Figure 1.2 (based 
on Nakhleh, Ringe & Warnow 2005: 397, tree 5A). The position of Albanian in 
this tree is uncertain. 

The Gray group also works with a binary-branching tree, but one that differs 
from the previous one except in the initial splits (Bouckaert et al. 2012, with 
arevised tree in Bouckaert et al. 2013); see Figure 1.3. The same tree is found in 
Chang et al. 2015. 
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Figure 1.2 Binary-branching model (Ringe group) 
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Figure 1.3 Binary-branching model (Gray group; Chang et al. 2015) 
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1.2 Terminology 


If authors use the same terms for different phenomena, misunderstandings 
easily arise, especially across different disciplines. Therefore, I wish to explore 
in some detail a term that 1s a recurring topic for discussion in historical 
linguistics yet which still causes much confusion, namely proto-language, 
a central concept in phylogenetic linguistics and in discussions of linguistic 
homelands. Most linguists would agree that the term refers to the last common 
ancestor of a group of related languages (see the discussion in Olander 2019b: 
10-12), but since “the last common ancestor" means different things to differ- 
ent authors, there is often little actual agreement on the content. 

In works based on cognacy databases, including Bayesian studies, I have not 
seen an explicit definition of the concept of a proto-language. However, as long 
as all items in the basic vocabulary lists of two or more speech varieties are 
cognate, these varieties are still considered to be one language. Accordingly, 
I assume, a proto-language does not dissolve as long as no word in the list is 
replaced by another word in one of the varieties. This mechanism may lead to 
undesired results. To give an exaggerated example for illustrative purposes, we 
might hypothetically assume two related speech varieties where all the basic 
words are cognate, but where, apart from that, there is only a minimal lexical 
overlap between the two varieties. Moreover, the varieties have diverged 
significantly phonologically and morphologically; for instance, one variety 
has [o] and [fove] for *water' and ‘hair’, with the cognates ['akkwa] and 
[ka pelli] in the other. The nominal and verbal inflectional systems are very 
simple in the former variety, while the latter has a nominal system with 
numerous cases and an elaborate verbal system with several tenses, aspects 
and moods. These varieties would still be considered one uniform entity in 
frameworks that only take basic vocabulary into consideration. 

In traditional historical linguistics, by contrast, a proto-language usually 
refers to the stage ofa language immediately before the first linguistic change — 
not only in the basic vocabulary — that does not affect all daughter languages 
(cf. Eichner 1988: 11-20; Olander 2015: 18-21 with references). By this 
definition a proto-language is a uniform entity with no dialects or other 
varieties. It is clear that this somewhat idealised definition, which refers to 
only one speech variety, does not correspond to a “real” language, which 
usually comprises a number of varieties. However, the definition is unambigu- 
ous and, crucially, a proto-language is the result of the application of the 
comparative method to a set of related languages, which makes very good 
sense from the point of view of historical linguistics. 

Still from the point of view of traditional historical linguistics, it may also be 
useful to be able to refer to a group of related speech varieties that have already 
diverged from each other yet are still close enough to introduce identical or 
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near-identical innovations. While some authors may conceive this as a proto- 
language, I prefer to reserve that term for the above-mentioned concept and to 
use common language to refer to this latter concept (cf. Olander 2015: 18-21 
for the general terminology, and 29-31 for its application to Slavic). Applying 
these definitions, then, Proto-Indo-European is the stage before the first lin- 
guistic change in any speech variety, whereas Common Indo-European refers 
to a group of already differentiated Indo-European varieties that are still 
linguistically close enough to carry out common innovations. 

In terms of absolute chronology, the stage immediately before any linguistic 
change in the speech community (detectable by the comparative method) 
logically precedes, usually by a considerable amount of time, both the last 
stage where common innovations are still possible and the stage immediately 
before a lexical item is replaced on a basic vocabulary list. Thus apparently 
similar ways of defining a proto-language (“last common ancestor") may, if 
understood differently, lead to widely diverging results. When taking into 
consideration how significant this terminological discrepancy may be, it is 
rather surprising that it is only very rarely addressed in the literature. 

Since the homeland of a proto-language is, to most authors, the location in 
space and time where a given proto-language was spoken (cf. Eichner 1988: 
20-1; Olander 2019b: 10-12), a precise understanding of what a proto- 
language refers to is central in discussions of linguistic homelands. If different 
definitions of a proto-language end up identifying language stages separated by 
several centuries or even millennia, it is not surprising that there 1s disagree- 
ment on when and where these stages were spoken. It is, in my view, quite 
possible that some of the disagreement about the time and place of the Indo- 
European homeland is directly caused by this terminological confusion. 
I should add that, in my opinion, a linguistic homeland should, for practical 
purposes, refer to the location in time and space where a common language, not 
a proto-language, was spoken (in the sense of the words just discussed). 

Another term that may be useful to introduce in discussions of proto- 
languages is a para-proto-language, which refers to the related speech varieties 
spoken at the same time as a given proto-language. For instance, Proto-Indo- 
European as we reconstruct it using the comparative method is one variety 
among several varieties spoken at the same time; these varieties may be 
referred to as Para-Proto-Indo-European. While we do not know much about 
these para-languages, which have subsequently been displaced by other speech 
varieties, their earlier presence may be indicated, e.g., by phonological irregu- 
larities in words that are apparently inherited from Proto-Indo-European but 
which may actually have been borrowed from Para-Proto-Indo-European 
varieties. 

If we accept that Anatolian and perhaps Tocharian were the two first sub- 
groups to separate (see the next section), then there must have existed 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


1 Introduction 9 


intermediate proto-languages below the level of Proto-Indo-European but 
above the level of the proto-languages of the individual subgroups — for 
instance the proto-language of the non-Anatolian subgroups, that of the non- 
Anatolian and non-Tocharian subgroups, as well as that of Italic and Celtic and 
that of Greek and Armenian. The need to be able to designate these intermedi- 
ate proto-languages has been highlighted in Olander 2019a (see also the careful 
considerations on the interpretation of a family tree, including the internal 
nodes, by Ringe, Warnow & Taylor 2002: 109; but cf. the provocative state- 
ment by Garrett 1999: 147 that “the intermediate nodes ... are nameless 
precisely because we do not need to refer to them"). I have applied the 
terminological principles laid out in Olander 2019a to the figures ofthe present 
chapter. 

It is important to acknowledge that these intermediate proto-languages are 
not defined by being residual compared to the subgroups they do not include. 
On the contrary, they are posited precisely because the subgroups descending 
from them display shared innovations, unlike the remaining subgroups (cf. 
Ross 1997: 222). If no shared innovations can be shown for a suggested 
intermediate proto-language, that proto-language is not justified in the 
model. 


1.3 Contents and Structure of the Book 


The book contains fifteen chapters. The first four chapters outline the back- 
ground to the book and address methodological issues. They also deal, from 
different perspectives, with the question of what the book is not, by discussing 
recent computational approaches to linguistic phylogenetics and why they are 
problematic. 

In this introductory chapter, the background and motivation for the book are 
outlined, and some of the terminological issues pertaining to linguistic recon- 
struction and linguistic phylogenetics are addressed. It summarises the content 
of the remaining chapters and discusses some of the perspectives they raise. 

Chapter 2, *Methodology in Linguistic Subgrouping" by James Clackson, 
shows how scholars have discussed the phylogeny of the Indo-European 
language family for the last 200 years, and it sets out the methodological 
choices that face current and future researchers. Since the late nineteenth 
century, it has been generally agreed that the best supporting evidence for 
a subgroup of A and B is the existence of non-trivial shared linguistic innov- 
ations made in both A and B but not in C. There is, however, still debate as to 
what counts as non-trivial, how to identify shared innovations that arose 
through language contact, how many innovations are required to construct 
a subgroup, and whether splits are necessarily binary. These debates are further 
explained and explored in the chapter. 
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Chapter 3, *Computational Approaches to Linguistic Chronology and 
Subgrouping" by Dariusz Piwowarczyk, presents an overview of computer- 
assisted approaches to linguistic subgrouping, highlighting advantages and 
drawbacks of the individual methods and evaluating the results achieved by 
applying these approaches. Since the exact same set of changes in the same 
order in two languages can be a sign of common development and, accordingly, 
of a subgroup, the chapter pays special attention to the potential of computa- 
tional simulations of sound change. This approach is illustrated by material 
drawn from different subgroups thought to be closely related, starting from the 
most obvious ones (Indo-Iranian) to the ones that are less obvious (Balto- 
Slavic) and even controversial (Italo-Celtic, Graeco-Armenian). 

Chapter 4, “What We Can (and Can't) Learn from Computational Cladistics” 
by Don Ringe, investigates the advantages and limitations of computational 
approaches to linguistic phylogenetics. It discusses the intractable size of 
cladistic data sets, which can only be processed using computational methods, 
the relative unreliability of lexical data, and the ways in which phonological 
and inflectional data must be used together to construct and root a cladistic tree. 
It also considers how to handle language groups with only partly treelike 
diversification. Finally, the chapter critiques some recent high-profile cladistic 
analyses from several angles, exposing further pitfalls in the incautious use of 
cladistic tools. Its conclusions are only moderately positive, but are argued to 
be realistic. 

The remaining eleven chapters each deal with one of the major Indo- 
European subgroups: Anatolian, Tocharian, Italic, Celtic, Germanic, Greek, 
Armenian, Albanian, Indo-Iranian and Balto-Slavic, plus the putative Italo- 
Celtic subgroup. Fragmentarily documented subgroups such as Phrygian and 
Messapic are not treated separately, but their positions in the family tree are 
discussed in relation to the major subgroups. The chapters have a similar 
structure. Each subgroup is presented together with its attestation, geographical 
distribution etc., the evidence for the subgroup, its internal subgrouping, its 
relationship to the other subgroups and a discussion of the position of the 
subgroup in the overall family tree of Indo-European. Since the subgroups 
are very different from each other on the various parameters, the chapters 
focus on different aspects of the phylogenetic description. For instance, as 
the Italic subgroup is more diversified by its earliest attestation than 
Armenian, the section dealing with the internal structure of Italic (Section 
8.3) is much more comprehensive than the corresponding Armenian section 
(Section 12.3). 

Chapter 5, by Alwin Kloekhorst, presents the Anatolian languages and some 
of their prominent linguistic features, discussing whether they represent archa- 
isms or innovations, only the latter being indicative of an Anatolian subgroup. 
The chapter proceeds with an analysis of the internal subgrouping of the 
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Anatolian languages, arguing for a Hittite subgroup and a subgroup comprising 
Lydian, Palaic and the Luvic languages. After a review of the alleged “western” 
affinities of Anatolian, the chapter discusses one of the most prominent prob- 
lems in Indo-European phylogenetics over the last several decades, namely the 
question of whether the Anatolian subgroup was the first one to separate from 
the remaining Indo-European languages. It concludes that Anatolian is indeed 
the outlier in the family, and that the gap between the split-off of Anatolian and 
the rest is substantial. 

Chapter 6, by Michaél Peyrot, introduces the two closely related languages 
known as Tocharian A and Tocharian B. It addresses the most important shared 
innovations that characterise these languages and thus define the Tocharian 
subgroup. This is followed by an analysis of the genealogical relationship with 
the other subgroups, especially Anatolian. The chapter also assesses the pos- 
ition of Tocharian in the Indo-European family tree, where Tocharian is often 
considered to be the second subgroup to separate, and reviews the arguments 
for and against this hypothesis. It is concluded that the question is still open, to 
some extent because the overall structure of the Proto-Indo-European verbal 
system is uncertain, which makes it difficult to distinguish innovations from 
archaisms in the descendants, including Tocharian. 

Chapter 7, by Michael Weiss, contains two main subsections. The first one 
discusses the reality of an Italo-Celtic subgroup within the Indo-European 
language family, concluding that there is enough evidence to assume 
a genuine but short-lived subgroup. The second subsection analyses the overall 
position of Italo-Celtic in the family tree. 

Chapter 8, also by Michael Weiss, offers a presentation of the Italic sub- 
group, the reality of which has sometimes been called into question, although it 
seems to be supported by a substantial number of shared innovations. The 
chapter addresses the internal subgrouping of Italic, where Latin and Faliscan 
constitute one subgroup, and Oscan and Umbrian another, the position of 
Venetic being unclear. The relationship between Italic and the other subgroups 
(except Celtic; see above) is discussed. 

Chapter 9, by Anders Richardt Jorgensen, first presents the Celtic languages 
and discusses the arguments, mostly of phonological nature, for a Celtic 
subgroup. The internal subgrouping of Celtic is contested: while the existence 
of a Goidelic and a Brittonic subgroup is uncontroversial, it is uncertain 
whether Brittonic forms a subgroup with Gaulish, with Goidelic or with neither 
of them. The chapter discusses the relationship of Celtic with Germanic and the 
other subgroups (except Italic; see above). 

Chapter 10, by Bjarne Simmelkjzr Sandgaard Hansen and Guus Jan 
Kroonen, introduces the Germanic languages, listing the most salient features 
characterising that subgroup. The chapter discusses the relationship between 
East, West and North Germanic, concluding that the latter two subgroups are 
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more closely related to each other than to the former. The subgroups that seem 
to be most closely associated with Germanic are Italic, Celtic and Balto-Slavic, 
although none of them appears to form an actual subgroup with Germanic in the 
family tree. Despite being innovative in many respects, Germanic also pre- 
serves certain archaic features that suggest it may have been one of the first 
subgroups to separate from the core group. 

Chapter 11, by Lucien van Beek, presents the Greek subgroup, arguing for its 
reality based on several innovations found in all varieties of Greek. It addresses 
the complicated question of the internal subgrouping of Greek and the relation- 
ship of Greek to Macedonian, Phrygian and Armenian, concluding that 
Macedonian may possibly be classified as a Greek dialect and that Phrygian 
constitutes a subgroup with Greek. The relationship between Armenian and 
Greek is not as close as 1s often maintained (cf. Chapter 12). The position of 
Graeco-Phrygian in the family tree, and especially the relationship with Indo- 
Iranian, is also discussed. 

Chapter 12, by Birgit Anette Olsen and Rasmus Thorse, examines 
Armenian, listing the innovations that constitute the evidence for the reality 
of the Armenian subgroup. It then analyses the relationship of Armenian to 
other subgroups of Indo-European, first of all Greek, but also Phrygian and 
Albanian, arguing that Armenian constitutes a higher-order subgroup, 
“Balkanic”, together with these three subgroups. Within the Balkanic group, 
Greek and Phrygian are most closely related, and together with Armenian they 
constitute a larger subgroup. Armenian and Albanian, on the other hand, do not 
share any exclusive innovations within Balkanic. 

Chapter 13, by Adam Hyllested and Brian D. Joseph, gives an overview of 
Albanian. After a brief discussion of the features that constitute Albanian as 
a separate subgroup and a presentation ofthe dialect divisions within Albanian, 
the chapter analyses the relationship of Albanian to the other subgroups. 
Special attention is given to the relationship between Albanian and Greek, 
which are regarded as forming a subgroup within a Balkanic group also 
consisting of Armenian and Phrygian. 

Chapter 14, by Martin Joachim Kümmel, presents the Indo-Iranian sub- 
group, discussing the relationship between Indic and Iranian and assessing 
the difficult question of the position of the Nuristani languages. It analyses the 
position of Indo-Iranian within the Indo-European family tree, arguing that it 
may have separated relatively early and stayed in contact with several other 
subgroups. 

Chapter 15, by Tijmen Pronk, covers the Baltic and Slavic languages. It 
analyses the much-debated relationship between the two groups, concluding 
that they do constitute a subgroup together. The chapter discusses the question 
of the internal structure of Balto-Slavic, especially the position of Old Prussian 
between East Baltic and Slavic, and it analyses the relationship of Balto-Slavic 
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to Germanic and Indo-Iranian, arguing that Balto-Slavic does not form 
a higher-order subgroup with these or other subgroups. 


1.4 Results and Perspectives 


As should be all too clear from the preceding section, this book does not solve 
all problems related to the higher-order phylogeny of the Indo-European 
language family. On the contrary, in many respects it raises more questions 
than it answers. At the same time, it also highlights the necessity not only of 
examining in more detail individual potentially shared innovations across 
subgroups but also of zooming out and looking at the entire family, and the 
importance of methodological considerations. The latter question is the topic of 
the chapters by Clackson (Chapter 2), Piwowarczyk (Chapter 3) and Ringe 
(Chapter 4), who investigate different methodological aspects of linguistic 
phylogenetics. 

With the exception of Balto-Slavic and, to a lesser extent, Italic, the reality of 
the main subgroups is hardly ever called into question; the rare exceptions have 
not found much support in the scholarly community (e.g. the doubts about 
Greek being a subgroup expressed by Garrett 2006; cf. also the characterisation 
of Iranian as a Sprachbund by Tremblay 2005: 687). In this book, the similar- 
ities between Baltic and Slavic are considered to be so striking that they are 
dealt with together in one chapter (Chapter 15 by Pronk). Similarly, the Italic 
languages display enough common innovations that they are also regarded as 
a real subgroup of Indo-European (Chapter 8 by Weiss). Considerably less 
certain is a group consisting of Italic and Celtic; in this book the relationship 
between these subgroups is considered sufficiently important to merit a chapter 
on Italo-Celtic (Chapter 7 by Weiss), although Italic and Celtic are also 
discussed in separate chapters (Chapter 8 by Weiss and Chapter 9 by 
Jorgensen, respectively). 

The view that Indic and Iranian, while clearly separate subgroups, do form 
a subgroup together is unchallenged, although the position of Nuristani within 
Indo-Iranian is still disputed, and there is no agreement on the position of Indo- 
Iranian in the overall family tree (Chapter 14 by Kümmel). 

When it comes to the higher-order grouping, however, the situation 1s 
much less straightforward. A recurring theme throughout the book is that 
most of the individual subgroups are very difficult to place in the overall 
family tree, except Anatolian, for which the idea of an early separation has 
gained much traction in recent decades and is further supported in this book 
by Kloekhorst (Chapter 5). The position of Tocharian, often regarded as 
the second subgroup to separate, cannot be established with any certainty 
since shared innovations of the remaining subgroups are difficult to determine, 
as argued by Peyrot (Chapter 6). 
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Weiss (Chapter 7) discusses the idea that Italo-Celtic may have split off 
relatively early from the tree, perhaps after Tocharian. Germanic, showing 
affinities above all with Balto-Slavic and Italic, is difficult to place in the 
overall tree, as argued by Hansen and Kroonen (Chapter 10). The mutual 
relationship between the “Balkanic” languages — Greek (Chapter 11), 
Armenian (Chapter 12), Albanian (Chapter 13) as well as scantily attested 
languages such as Phrygian and Messapic — is evaluated differently by the 
authors of this book. While Greek is thought to constitute a phylogenetic 
unit together with Phrygian in all three chapters, the hypothesis of 
a Graeco-Armenian subgroup is given a negative appraisal by van Beek 
(Chapter 11), while Olsen and Thorse (Chapter 12) are positive. A third 
position is taken by Hyllested and Joseph (Chapter 13), who argue that 
Greek forms a subgroup with the notoriously difficult Albanian. Interestingly, 
the evidence for a subgroup consisting of Indo-Iranian and Balto-Slavic, 
occasionally discussed in the literature (Soborg 2020: 52; cf. Ringe, Warnow 
& Taylor 2002: 103-4), is considered to be insufficient by both Kümmel 
(Chapter 14) and Pronk (Chapter 15). 

Even without decisive answers to many of the questions that were also 
being asked in Indo-European linguistic phylogenetics a decade and a half 
ago, these diverging conclusions — rather than indicating that the endeavour 
of modelling the Indo-European family tree is a failure — contribute to 
a more diverse picture of the dissolution of the Indo-European proto- 
language. When the evidence is not clear-cut, it is natural that assigning 
different weight to the various pieces of evidence leads to different conclu- 
sions. Interestingly, the different conclusions reached in the various chap- 
ters only rarely seem to hinge on discrepancies in the reconstruction of 
Proto-Indo-European and its development into the individual daughter 
languages, although one might have expected such discrepancies to play 
a significant role. 

This book examines the Indo-European language family from the point of 
view of each of the ten main subgroups of Indo-European. While a systematic 
individual assessment of the subgroups is an indispensable first step towards 
a better understanding of the internal structure of the Indo-European family 
tree, it is also clear that there is a need for a systematic reassessment of the Indo- 
European family tree as a whole. It remains an open question as to whether this 
should be done purely by applying the traditional methodology, which seeks to 
identify and evaluate significant shared innovations, or if computational 
methods, which make it possible to work with larger data sets, can contribute 
anything of true value. 

If we are able to obtain a relatively solid picture of the higher-order sub- 
grouping of the Indo-European language family, the family tree may serve as 
a vital means of solving problems of Indo-European reconstruction. Any 
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reconstruction should be evaluated in the light of the family tree, and 
a reconstruction suggested by several subgroups is only justified for Proto- 
Indo-European itself if it is compatible with the outlier in the family (see Ringe 
1998; Olander 2018). Without an understanding of the structure of the Indo- 
European family tree, it is also difficult to trace the prehistoric spread of the 
Indo-European languages throughout Europe and western Asia. 


1.5 Practical Remarks 


I have strived to harmonise the notation of attested and reconstructed forms in the 
individual chapters without forcing my own views on the authors. For instance, the 
purely conventional use of * 2 2^ and *i u has been introduced for *& é $^ and *y w 
in Proto-Indo-European reconstructions. However, when the notational differ- 
ences are the result of different conceptions of the reconstructed forms, I have 
retained the authors’ preferences. Thus I have not harmonised e.g. the absence or 
presence of laryngeal colouring (e.g. pf.1sg. *uóid-h;a vs. *uóid-h;e ‘I know’), 
vocalisation of sonorants (*ulk"o- vs. *ulk"o- masc. *wolf"), the notation of Proto- 
Indo-European laryngeals (*h, h2 h3 vs. *h, *y, *g, in Chapter 14), dorsals (*k g vs. 
*q c, again in Chapter 14), or different reconstructions of individual morphemes 
(e.g. *h26ui- vs. *h3éui- *sheep") across chapters. I have also retained *K g $^ for *k 
& $^ and *j w for *i u. While this practice means that the notation may differ 
slightly across chapters, I believe that readers who notice the discrepancies will 
also understand their raisons d’être without being confused. 

It should be noted that the terminology for reconstructed stages of the Indo- 
European language family is not uniform. For instance, while the reconstructed 
ancestor ofthe family is referred to as “Proto-Indo-European” by some authors, 
others prefer “Proto-Indo-Anatolian”. The terminology of the individual 
authors has been retained. For a discussion of the terminology describing the 
nodes of the Indo-European family tree, see Olander 2019b. 
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2 Methodology in Linguistic Subgrouping 


James Clackson 


2.1 Introduction 


If two or more languages form a subgroup of a language family, what does it 
mean? To answer this, it will be helpful to consider the case of three related 
languages, A, Band C. I shall assume that these three languages are all spoken 
at the same point in time and are all derived from an unattested proto-language, 
which I shall call Proto-ABC (I shall also refer to the language family as ABC). 
If the languages A and B form a subgroup within ABC, this means that it is 
possible to reconstruct a stage intermediate between Proto-ABC and languages 
A and B, which I shall call Proto-AB. To put this in other words, there existed 
a community of Proto-AB speakers at the time when a separate speech com- 
munity spoke Proto-C, the language ancestral to C. The situation can thus be 
represented as in Figure 2.1, where languages are placed in a relationship to one 
another, much as with a family tree of genealogical descent.' Diagrams such as 
Figure 2.1 are accordingly called “tree diagrams". 


2.2 The History of Subgrouping 


The recognition of subgroups of the Indo-European language family precedes 
the recognition of the language family itself. Scaliger (1610) was already able to 
recognise the Romance, Germanic and Slavic families of languages, matrices 
linguarum in his terms, from shared vocabulary (notoriously using the word for 
*god' as a diagnostic), and earlier scholars had grouped several languages as one 
in order to preserve the Biblical notion of seventy-two languages of the world.” 
From the beginning of the nineteenth century, the first scholars of Indo- 
European operated with subgroups such as Germanic and Slavonic. Thomas 
Young, in the same article which saw the first use of the term “Indo-European” 
arranged the languages of the world into a three-step hierarchy: classes (of 


! Hoenigswald (1966: 3-5) discusses more complicated arrangements between three putative 
languages A, B and C. 

? Borst (1957-63) shows in detail the changing conceptions of languages and language families in 
the pre-Modern era. For the background to Scaliger’s work, see Simone (1998: 163-5). 
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Figure 2.1 Family tree of the ABC language family 


which Indo-European was one), orders and families (1813: 256). Young’s Indo- 
European class comprised no subordinate orders, but sixteen “families”, some 
of which are familiar, German, Celtic, Latin and Sclavic, but others less so 
(Arabian, Etruscan and Cantabrian).? The first representation of the relationship 
between languages of the Indo-European family by something like a tree 
diagram is generally attributed to Schleicher, who included a schematic 
Stammbaum at the beginning of his Compendium (1861: 7) although there 
was no explanation of how the groupings had been arrived at^ The figure 
from Schleicher's compendium is reproduced as Figure 2.2. Unlike the diagram 
given in Figure 2.1, Schleicher's tree is presented with the parent language on 
the left, and the daughter languages on the right. The “branches” of the tree are 
labelled, rather than the nodes as in Figure 2.1. 

The first Indo-Europeanists to give serious consideration to the methodology 
of language subgrouping were the “neogrammarians” (or “Junggrammatiker”), 
a group of scholars originally based around the University of Leipzig in the 
1870s.? The neogrammarians are associated today principally with the idea that 
sound change is regular and exceptionless, but their work on sound change was 
part of a larger programme which established a firmer basis for comparative 
linguistics. The neogrammarians were more explicit about how and why they did 
what they did than their predecessors, with publications on the techniques and 
practices of linguistic comparison.^ In the case of subgrouping, the first tangible 
advance made by the neogrammarians was Hübschmann's demonstration that 
that Armenian was not an Iranian language, but a separate branch on its own.’ 


? Compare Max Müller’s later division of Indo-European languages into divisions, classes and 
branches (Müller 1861: 380, discussed by Petit 2012: 25-7). The term Cantabrian is an alterna- 
tive name for Basque, as used by Adelung. 

See Petit (2012: 22-5) for discussion of an earlier tree-diagram than Schleicher’s, by František 
Ladislav Čelakovský representing the relations between Slavic languages; Blažek (2007) gives 
a survey of the development of tree diagrams after Schleicher. 

See Morpurgo Davies (1992: 226—78) for the neogrammarian school and its impact on 
linguistics. 

See for example the two books on theoretical linguistics published in 1880, Delbrück 1880 and 
Paul 1880, discussed by Morpurgo Davies (1992: 245-51). 

Hübschmann 1875; see the discussion of Hübschmann's achievement in Clackson 2016. 
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Figure 2.2 Schleicher’s tree diagram (Schleicher 1861: 7). 


The theoretical principles and methods set out for identifying subgroups were put 
forward by Leskien (1876), partly as a critical response to the Berlin professor 
Johannes Schmidt's work on the “wave model" (Schmidt 1872). Leskien was 
teacher and mentor to many of the neogrammarians, and his work on subgroup- 
ing was then refined by Delbrück (1880: 135) and Brugmann (1884).5 The 
methodological advances made by these scholars are enormous. It is to them 
that we owe the principles that linguistic subgrouping proceeds through the 
identification of shared innovations, rather than shared archaisms, and the 
recognition that phenomena which could arise from language contact, such as 
shared lexical items, should be treated with caution for subgrouping purposes." 

Indeed, Brugmann’s statement of what constitutes a subgroup (1884: 253) 
has often been cited, and is worth repeating once again: 


Es ist hier nicht eine einzelne und sind nicht einige wenige auf zweien oder mehreren 
Gebieten zugleich auftretende Spracherscheinungen, die den Beweis der náheren 
Gemeinschaft erbringen, sondern nur die große Masse von Übereinstimmungen in 
lautlichen, flexivischen, syntaktischen und lexicalischen Neuerungen, die grofe 
Masse, die den Gedanken an Zufall ausschließt. 


* In Morpurgo Davies's words “the neogrammarians, as often, took their cue [sic] from Leskien” 
(1975: 650). 

2 Morpurgo Davies (1975: 650) and Petit (2012: 29-30) associate these ideas directly with 
Leskien, but as I showed in a recent paper (Clackson 2016), they are already implicit in 
Hübschmann’s (1875) work on Armenian. 
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[The proof of a close commonality comes not from a single isolated or a small 
number of linguistic developments occurring simultaneously in two or more areas, 
but only through a large number of innovations in phonology, morphology, syntax 
and vocabulary — a number so great as to exclude chance from consideration.] 


Brugmann's 1884 article has set the scene for the subgrouping of the Indo- 
European family and other language groupings ever since." Brugmann dis- 
cussed the possible subgroups of Indo-European, seeing only two cases where 
the recognised branches of Indo-European might be grouped together: Indo- 
Iranian and Balto-Slavic. It is significant that since 1884 there have been no 
serious suggestions for some of the higher order groupings proposed seen in 
Schleicher's family tree, and the Indo-European family continues to be thought 
of in terms of the branches Brugmann identified.'' After the neogrammarians 
Schleicher’s “Graecoitalokeltisch” and “Slawodeutsch” all but disappear from 
the academic debate for the next hundred years.'” Representations of the Indo- 
European family in tree diagrams in the century after Brugmann's article 
tended to show the branches of Indo-European radiating out as spokes from 
a centre. '? Indeed, the discovery of two new branches of Indo-European in the 
early twentieth century, Anatolian (of which Hittite was the earliest identified) 
and Tocharian, had little initial impact on the presentation ofthe Indo-European 
languages. Bloomfield's tree diagram (Bloomfield 1933: 315) does not include 
branches for Anatolian or Tocharian, and Meillet was able to issue a second 
edition of a book written originally in 1908 (and discussed further below) in 
1922 only noting the recent addition of the two branches (Meillet 1922: 3). 


2.3 Criteria for Subgrouping 


The reliance on common innovations rather than common retentions, and the 
need to avoid linguistic agreements that could have arisen independently, or by 
chance, have been accepted by nearly all those working on subgrouping 
methodology since Brugmann.'^ It has been suggested (Dyen 1953: 581-2) 
that, despite linguists’ theoretical adherence to the methodology of Brugmann, 


10 See the discussion of Dyen (1953: 580-2), who is the first to use the term “subgrouping” in 

English. 

See for example the presentation of the Indo-European languages in Fortson 2010 and Klein, 

Joseph & Fritz 2017-18. 

'2 A Balto-Slavic-Germanic subgroup reappears in the tree-diagram of Gamkrelidze & Ivanov 

(1984: 415). 

13 To give just two examples, Bloomfield 1933: 312 and the representation of the Indo-European 
language family in editions of the American heritage dictionary of the English language (first 
published in 1969). 

14 See Porzig 1954: 17-52 and Clackson 1994: 4-11 for surveys of work on Indo-European 

subgrouping in the twentieth century; Ringe & Eska (2013: 256-7) and Ringe (2017: 63) 
have recently reiterated the need to base subgroups on significant shared innovations. 
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most subgrouping has actually been carried out *by inspection", that is to say, 
through the recognition of a large amount of similarities between closely 
related languages (much as Scaliger was able to recognise that the Romance 
languages or the Germanic languages belonged together). This may be true at 
a very basic level, but any serious considerations of subgrouping for individual 
languages since the 1870s have proceeded through careful application of 
something like the Brugmannian criteria. This is especially the case for the 
less well-attested Indo-European varieties, such as Phrygian, Venetic or 
Lusitanian. If scholars have not used Brugmann's criteria to test the validity 
of the Germanic branch, or Slavic, it is because the innovations are numerous 
and self-evident. 

In the rest of this chapter, I shall look first at further clarifications of the 
criteria for subgrouping given above, before considering alternative models to 
the family tree. Advances in the neogrammarian methodology outlined above 
have been made in three principal areas: assessment of what counts as an 
innovation; ways to avoid “false positives", that is, apparent shared innovations 
which actually arise by chance or through language contact; and in the use of 
computational methods in order to survey large amounts of data (see Chapters 3 
and 4).'? I shall discuss the first two of these developments in this chapter, 
leaving the third to other contributors to this volume. 

How do linguists recognise an innovation against a shared retention? In some 
areas, innovations are easier to detect than others. Once speakers of a language 
have merged or partially merged two phonemes, this change cannot be undone. 
Consequently, phonological changes offer the clearest examples of innovations 
which can be recovered by the historical linguist. As Hoenigswald put it 
(1966: 7), the phonological merger is the “prototype” of the shared innovation. 
Vocabulary replacement and syntactic change provide examples where it is 
often more difficult to isolate which development is an innovation and which is 
not. If languages A and B share a vocabulary item, for example the word for 
*man' or a verb used to mean ‘stand’, and this vocabulary item is not found in 
language C, how is it possible to ascertain whether that is a shared retention or 
a common development of the two languages? Dyen is one of only a few 
scholars to address this question directly: 


If any two or more related languages share a feature, the question arises whether this is 
a retention or an innovation. If we apply a general rule that such features are taken to be 
retentions unless there is evidence to the contrary, then a corresponding proto-feature is 
reconstructed. It follows that (borrowings being excluded) an innovation occurring in 


15 See Ringe, Warnow & Taylor 2002: 66 for the statement that the computational approach to 
subgrouping “is not intended to replace already existing methods, but to supplement them" 
(emphasis in the original). 
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two or more languages can be detected only if, as a proto-feature, it contradicts a proto- 
feature which for some reason appears to be more ancient. (Dyen 1953: 581) 


In the Indo-European domain, researchers have the advantage that the 
Anatolian languages, now generally agreed to have split first from the Proto- 
Indo-European parent, can sometimes provide a guide to what forms are more 
“ancient”. To take one example, Greek, Armenian, Albanian and Tocharian 
share reflexes of a root ‘hand’, reconstructed as *g’es-r- (Greek yeip, Armenian 
jern, Albanian doré, Tocharian A tsar). For Pedersen (1924: 225), followed by 
Solta (1960: 316—17), this was a significant lexical agreement between these 
branches. However, it is now clear that the word is also present in the Anatolian 
branch; the presence of the word in the other languages is much more likely to 
be a shared retention rather than an innovation. ^ Vocabulary items may also be 
judged to be archaic rather than innovatory through their inflectional class or 
derivational patterns, or because it is possible to reconstruct a semantic shift in 
one direction rather than another. Even so, such decisions are often reliant on 
the judgement of the linguist, and in many cases it is impossible to say whether 
a lexical agreement reflects an innovation or a retention (Hoenigswald 1966: 8— 
9; Klingenschmitt 1994: 236). 

Innovations in inflectional morphology are also to some extent reliant on the 
picture the researcher has of the morphology of the parent language, and hence 
susceptible to the same criticism as the use of shared items of vocabulary for 
subgrouping purposes. Morphological innovations are, however, generally 
easier for linguists to spot, since they may be linked to phonological changes 
and thus more easily linked to a relative chronology.'’ Moreover, in inflectional 
morphology at least, the set of options which can be reconstructed for the parent 
language is in general much smaller than for lexical innovations. 
Morphological innovations may also be associated with a larger change in 
the morphosyntax ofthe language, such as the creation of a new category or the 
merger of earlier categories. 

In the Indo-European language family, little use has been made of syntactic 
changes for subgrouping, partly reflecting uncertainties about the reconstruc- 
tion of Indo-European syntax, with consequent uncertainty about what counts 
as an innovation. '^ In this regard it is important to note recent attempts to find 
Indo-European subgroups relying on syntactic information put forward by 
Longobardi & Guardino (2009) and Longobardi et al. (2013). These 
researchers, working in a Chomskyan syntactic framework, make use of a set 


16 Hittite kessar, Hieroglyphic Luwian istra/i-, Lycian izre/i- (see Kloekhorst 2008: 471-2); for the 
use of the word for ‘hand’ in a recent subgrouping enterprise, see Ringe, Warnow & Taylor 
2002: 82-3. 

17 See further below for the importance of relative chronology. 

'8 Ringe, Warnow and Taylor (2002) include no syntactic features in their data set. 
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of syntactic parameters, which have been carefully chosen to ensure that the 
selected parameters show no overlap between them. It is perhaps revealing that 
the researchers do not attempt to isolate which of the parametric changes count 
as innovations, relying on the computational approach to identify innovations 
(Longobardi et al. 2013: 148). Given the absence of suitable information about 
the parametric constraints of most of the older Indo-European languages, this 
approach has not proved to be especially helpful for refining current thinking 
on subgrouping in the language family. 

The next problem is how to avoid “false positives”, that is, shared innov- 
ations between two languages which were not made during a period of genea- 
logical unity but which come about at a later stage in the language histories. In 
terms of the hypothetical language family discussed at the beginning of this 
chapter, examples would be developments shared by languages B and C that 
took place after the break-up of the Proto-ABC community, or shared by A and 
B but made after the period of Proto-AB. Such shared developments among 
separate speech communities may reflect a situation of language contact, for 
example a period when many speakers of A also spoke B, or when speakers of 
A and B both spoke a third language, or another more complicated contact 
situation. Alternatively, a shared development made independently is some- 
times attributed to “chance”. In effect, what this usually means is that the 
innovation may reflect a universal tendency of language development, such 
as the palatalisation of dorsal consonants before front vowels or the “drift” 
from perfect formations of the verb to perfectives.'” Languages of the same 
family have inherited similar structures, and it is consequently not unexpected 
that the same innovations may occur independently. 

As has been recognised since Meillet (1908: 10), understanding the relative 
chronology of changes is essential in order to determine which shared devel- 
opments are common shared innovations and which are not. In the terms of the 
ABC language family, an innovation which is apparently shared by A and B is 
not diagnostic for subgrouping if it can be shown to have taken place after 
a development that took place after A had split from B. To take an example 
from the Italic language family, Oscan and Umbrian have both undergone 
a process of syncope of short medial vowels so that, for example, an earlier 
stem *opesa- develops to an Oscan stem üpsa- and Umbrian osa-. But this 
change 1s fed by consonant changes in Umbrian, such as the development of 
intervocalic *d > rs and the palatalisation of velars before front vowels. 
Syncope, which is not an uncommon change cross-linguistically, is thus an 
innovation shared by Oscan and Umbrian but is not diagnostic for their 


1? For a survey of the sound changes affecting consonants which occur across Indo-European 
languages, see Kümmel 2007; on universal paths in the grammaticalisation of tense and aspect, 
see Bybee, Perkins & Pagliuca 1994, which has spawned a large body of work on processes such 
as the drift from perfects to perfectives (sometimes called “aoristic drift"). 
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subgrouping, since it must have taken place after Umbrian changes not shared 
by Oscan (see Clackson 2015: 10). 

In the search for diagnostic innovations for subgrouping, it may not always 
be possible to construct a relative chronology for the feature in question in 
relation to what else is known about the prehistory of the language family. 
Accordingly, linguists have looked to assess the likelihood that a particular 
innovation might be the result of contact or universal processes, rather than 
a shared innovation. The result is that some shared innovations carry more 
“weight” than others, which may be dismissed as “trivial” or "insignificant"? 
Phonological processes which are frequent across the languages of the world, 
such as palatalisation, lenition, apocope, are accordingly usually dismissed as 
easily replicable and non-diagnostic. Many scholars have given greater weight 
to less common or more “unusual” sound changes, although in the absence of 
a general cross-linguistic repertoire of all known sound changes, this may rely 
more on the researcher's own knowledge than an objective assessment. Note 
also that in the Indo-European language family, the judgement of whether 
a change affecting reconstructed consonants such as “laryngeals” or “voiced 
aspirates” is unusual or not also reflects the reconstructed model of PIE which 
is used. Individual shared vocabulary which might arise from borrowing from 
languages now lost is similarly easily discounted for subgrouping purposes. 
Once again it is innovations in the field of morphology, particularly inflectional 
morphology, which has been seen as especially significant. Incorporation of the 
inflectional morphology of one language into another is not unknown 
in situations of prolonged or close contact, or in particular social situations, 
but it is generally accounted the most resistant area of language to borrowing.^' 
The creation of a new morpheme often reflects the grammaticalisation of a new 
category or the merger of earlier categories, and accordingly morphological 
innovations generally have significant structural importance in the languages in 
question. 

The question remains of how many innovations is enough to reconstruct 
a subgroup? Brugmann rejected the reliance on a “few” innovations, calling 
rather for a “large number", but then he had not made the various further 
refinements of sorting through what were certain, appropriate or significant 
innovations, using the methodology developed by later scholars and outlined 
above. Once all potential shared innovations between two branches have been 


20 The “weighting” of isoglosses is implicit already in Hübschmann 1875, and highlighted by 
Meillet (1908) and Porzig (1954). 

?! See Thomason & Kaufman 1988: 18-20 for the dismissal of earlier claims that morphology is 
impervious to borrowing. Morphological borrowing is not just limited to “exotic” languages: 
the Latin first declension genitive -aes, found predominantly in texts written by writers with 
little education, shows the partial transfer of a Greek morpheme (see Adams 2003: 473-86 for 
discussion). 
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carefully sifted to determine whether they meet all suitable criteria, and those 
for which there remains any room for doubt have been set to one side, then is it 
still justified to say that the number remaining is too small to be significant?” 
Recent family trees of Indo-European arrived at using computational cladistics, 
which may have examined a large set of data across vocabulary, morphology 
and phonological changes (as Ringe, Warnow & Taylor 2002), show a much 
greater number of branches and subgroups than most of those constructed 
following Brugmann’s 1884 article.” In the Ringe, Warnow and Taylor tree, 
binary splits are the norm, as opposed to the fan-like array of earlier trees. The 
difference partly reflects the ways in which the computational analysis is 
constructed, but it also reflects the fact that some of the subgroups are con- 
structed on what is in effect quite a small number of shared features. The Greco- 
Armenian clade, for example, is supported by only six shared lexical features, 
four of which need not be significant.** 


2.4 Subgroups and Prehistoric Dialect Continua 


So far in this chapter, I have largely followed the assumption that language 
change operates over uniform speech communities and that language diversifi- 
cation happens when a single speech community splits into two or more 
separate groups. However, linguistic history is rarely so straightforward. 
Clean breaks in the tree-diagram, such as that envisaged in our opening 
example between Proto-AB and Proto-C, may occur as the result of large- 
scale dispersals of a population after cataclysmic natural disasters, through 
massive migrations or other situations, but in the majority of documented 
situations, the diversification of a language into separate, mutually unintelli- 
gible, descendants takes place through periods of dialect continua, which might 
sometimes last for millennia. Indeed, the spoken varieties of Romance, 
Germanic, Slavic and several other branches of the Indo-European family 
still can be described, in whole or in part, as dialect continua. Since Schmidt 
(1872), linguists have recognised that the spread of phenomena over dialect 
continua are not best captured by a tree-diagram model. Schmidt himself 
famously proposed an alternative to the tree diagram, the “wave theory” 
(“Wellentheorie”), to explain the rippling effect of linguistic changes over 
a range of mutually comprehensible varieties.^? 

Leskien and the neogrammarians made a significant advance on 
Schmidt’s observations by pushing the period of dialectal variation back 


?? This is a criticism that has been levelled at me (see, for example, Holst 2009: 53-5) for my 
“hyper-critical” analysis of the evidence for a Greek-Armenian subgroup (Clackson 1994). 

23 See the survey in Blažek 2007. 

?* As noted by Ringe, Warnow & Taylor (2002: 102-3), Ringe (2017: 69). 

?5 See the discussion by Petit (2012: 27-9). 
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to the proto-language, rather than, as Schmidt had suggested, a period 
when it was possible to recognise the first branches of Indo-European. 
Their methodological justification for this move was that all spoken 
languages contain some variation, and it is consequently likely that the 
proto-languages also exhibited variation.°° This line of reasoning was 
followed up by scholars in the early twentieth century, such as Meillet, 
whose 1908 book Les dialectes indo-européens explored at greater length 
various shared developments of vocabulary, phonology and morphology 
that might reflect dialectal divisions within the parent language (Meillet 
1908, second edition 1922). For example, the noteworthy shared agree- 
ment of Germanic, Baltic and Slavic in showing *m rather than *b* in 
oblique case markers of the noun could only be explained, according to 
Meillet, through the supposition of different dialects of Proto-Indo- 
European (1908: 119)?" The reconstruction of dialects of the proto- 
language thus allowed historical linguists a way to account for a small 
number of similarities between languages which were not sufficient on their 
own to support the reconstruction of a subgroup but were too significant to 
be ignored. As we have seen, the net effect of this move was that, in contrast 
to the recognised subgroups lower down the family tree, such as Germanic, 
Celtic etc., after Brugmann (1884), there were only two generally agreed 
“higher-order” subgroups, Indo-Iranian and Balto-Slavic. The supposition of 
a “dialectal” Proto-Indo-European could help explain the existence of a small 
number of exclusive and significant innovations shared between two or more 
branches, and also the overlapping nature of these agreements, so that some 
features might be shared between Germanic and Balto-Slavic, and others 
between Balto-Slavic and Indo-Iranian. 

The supposition of a dialectal array of Indo-European has consequently proved 
popular and in several handbooks of historical linguistics or Indo-European 
languages it is possible to find “dialect maps” of Proto-Indo-European, with the 
putative varieties ancestral to the different branches of the family laid out in 
something approximating to their geographical attestation, with Germanic at the 
top left and Indo-Iranian in the bottom right, and then isogloss lines linking or 
separating groups corresponding to shared “dialectal” features, such as the use of 
*m or *b" in oblique case-markers or the operation of a phonological process 
known as the ruki rule.” Such maps meet with the immediate criticism that they 


26 As noted by Petit (2012: 31) who cites Leskien (1876: xv): “auf dem Boden der Urheimat 
[bestanden] bereits dialektische Unterschiede" [there were already dialectal differences in the 
territory of the (Indo-European) homeland"]. 

27 Bloomfield (1933: 314-5) also uses the example of the *m and *b" case markers as indications 
of dialectal differences in PIE. 

?* Meillet's schematic map (1908: 134) has been followed by many others. Anttila 1989: 305 is the 
most sophisticated with twenty-four isoglosses included; Hock 1991: 445 has seven isoglosses, 
Mallory & Adams 2006: 73 just six. 
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have the potential to include items of different time depths on a single plane. Thus, 
if the Anatolian branch separated out from the other IE languages first, it is an 
anachronism to include it in a dialectal area which could not yet have existed. 
Meid (1975) has accordingly attempted to reconstruct a “space-time” model for 
PIE, which is in effect a “three-dimensional” dialect map, incorporating both 
temporal and dialectal variety. 

The reconstruction of a Proto-Indo-European parent language which has 
variation over time and space meets with a significant methodological objec- 
tion: it is difficult to falsify. As Ringe (2017: 65) elegantly expresses it: “new 
evidence that is at variance with evidence already in hand can often be 
accommodated on an abstract dialect *map' without major revisions." 
Moreover, it significantly overplays the importance of the evidence which 
happens to survive. Since 1950 the number of early Indo-European texts 
available to scholars has been significantly increased through greater know- 
ledge ofthe Anatolian languages; the decipherment of Linear B and consequent 
accessibility of the earliest stage of Greek; and discoveries and improvements 
in the understanding of a number of smaller, fragmentary languages, such as 
Gaulish, Celtiberian and South Picene. This huge increase in our knowledge of 
the languages used in the first and second millennia BCE has, paradoxically, 
made scholars more aware of what has been lost. It is clear that, since the Iron 
Age, speakers of a relatively small number of language families and subfam- 
ilies have been hugely successful in Eurasia, and their dominance has been 
responsible for the demise of countless other languages, many of which were 
Indo-European. As Ringe & Eska (2013: 262-3) note, the branches of Indo- 
European that we know about are “probably the surviving remnants of what 
was once a dialect network", and the apparent sharp distinctions between them 
are just the reflection of the “pruning” of closer neighbours. Garrett has 
suggested in a number of articles that this loss of the intermediate languages 
in a larger dialectal continuum means the dialectal array of Proto-Indo- 
European after the separation of Anatolian, and probably Tocharian, may not 
be retrievable (see Garrett 1999; 2006; Babel et al. 2013). This is not just 
because of the pruning problem but also because of the fact that the subgroups 
as we have them may reflect linguistic changes across a dialect continuum, 
which took place across already divergent dialects. A case in point is Greek, 
which shows dialectal divisions already in the Mycenaean period, but for which 
all dialects were to undergo significant shared innovations in the next 500 
years. These subsequent wave-like innovations take on a special significance 
when we have lost so many of the intervening dialects. The combination of 


?? In the map of Anttila (1989: 305), Anatolian sits in the middle separated from all other 
languages by two isoglosses, one of which is drawn with a thicker line. The maps of Hock 
(1991: 445) and Mallory & Adams (2006: 73) do not include Anatolian. 
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shared innovation across an already differentiated dialectal continuum and 
subsequent “pruning” of intermediate dialects means that the shape of the 
original dialectal array is forever unobtainable. 

Few Indo-Europeanists have been willing to accept Garrett's arguments for 
scepticism about subgrouping, however, and most have continued to operate 
with a branching tree model, with shared innovations as the diagnostic for the 
construction of a subgroup.” The objections to Garrett's proposals are founded 
on a reluctance to give weight to the “unknown unknowns", that is the 
unrecorded Indo-European varieties which gave way to the languages which 
we know about, and which may have formed a bridge between what we now 
think of as different Indo-European subgroups (see the comment recorded by 
Garrett of an anonymous referee at 2006: 48 n. 5; de Vaan 2008: 1229-30). 
These varieties doubtless existed, but we don't know how they would have 
changed our whole picture of Proto-Indo-European as a whole or, indeed, what 
they would have been like. To abandon the whole enterprise of subgrouping 
because we don't know what we are missing seems a step too far. Moreover, 
there has been no conclusive demonstration of an Indo-European subgroup that 
has actually arisen through later convergence."' It is likely that subgrouping as 
currently carried out will continue, even though Garrett's arguments are 
a healthy reminder of the importance of considering the relative chronology 
of linguistic developments, and of guarding against the false reconstruction of 
a subgroup on the basis of changes which must have actually been 
convergences. 
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3 Computational Approaches to Linguistic 
Chronology and Subgrouping 


Dariusz Piwowarczyk 


3.1 Computational Approaches to Historical Linguistics 


Computational approaches to historical linguistics can be roughly divided into 
those pertaining to language classification (e.g. Ringe et al. 2002), chronology 
(e.g. Gray & Atkinson 2003), cognate detection (e.g. Kondrak 2002), compara- 
tive reconstruction (e.g. Hewson 1974) and the simulation of phonological 
(Baker 2008) and analogical change (Skousen 1989).' They usually involve 
quantitative methods to calculate the relationship between the languages or to 
date the chronology of their split from the proto-language, as well as algorithms 
to align cognates, reconstruct proto-forms or simulate sound changes (includ- 
ing artificial neural networks for analogical changes). 

Although the use of quantitative methods is not new in contemporary 
linguistics — they were already being used in the 1930s — the application of 
computational methods in historical linguistics, like those used in evolution- 
ary biology, represents a novel approach which is gaining adherents but is 
mostly regarded as problematic by traditional historical linguists. This is 
partly because of the history of quantitative approaches, which included 
methods such as glottochronology and lexicostatistics that are now found to 
be largely unreliable (cf. Bergsland & Vogt 1962) or controversial at the very 


This work has been written under the research project financed by the National Science Centre 

(Poland) decision number: 2018/02/X/HS2/03669. I am grateful to Thomas Olander, Don Ringe 

and Michael Weiss for comments on the earlier version and to Pete Westbrook for correcting my 

English. Needless to add, I am solely responsible for errors and mistakes. 

' Cf. Sims-Williams 2018 for a recent short overview of past approaches. Computational 
approaches to historical linguistics have been expanding ever since the early 2000s. 
Nowadays, there is an enormous outgrowth in works dealing with computational historical 
linguistics in all its aspects (language classification, cognate alignment, sound change simula- 
tion, analogical change simulation). For an overview of the most recent approaches, see Dunn 
2015 and Jager 2018. For a short overview and assessment of quantitative approaches to 
historical linguistics, see the discussion in the third edition of Lyle Campbell’s handbook 
(Campbell 2013: 447-92). 
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least (cf. Hoijer 1954). Additionally, many linguists approached such methods 
with extreme caution because they involved handling and converting the data 
to a machine-readable format, using statistical algorithms which seemed to 
generate a “black-box” effect rather than explain the results, and comparing 
language development to the replacement of genetic material in evolutionary 
biology. Compounding the problem was the fact that many of these early 
computational approaches were actually implemented by computational 
biologists rather than linguists. 

Even though the methods and the data used have often not been applied very 
methodically or carefully (especially in the early works dealing with linguistic 
classification and chronology), there is no doubt that computational methods can 
be useful in linguistics, simply because computers can analyse masses of data in 
a short space of time and without errors. This might not be helpful in all aspects of 
historical linguistics, but it will certainly make it easier to check, for example, the 
coherence of a hypothesis put forward, to test competing hypotheses or the results 
of the application of a sound change etc. However, it has to be borne in mind that: 
1. The computer is only a tool and the results yielded will always ultimately 

depend on the quality of the algorithm, the input data and how they are 
converted into a machine-readable format (together with all the judgements 
made by the researchers at this point). 

2. Research results from using computational methods have to be interpreted, 
and the method itself is usually meant as a supplement to traditional 
methods, not a replacement. 

In this chapter, I will outline the history of computational approaches to 

historical linguistics, concentrating on those concerning language classification 

and chronology, and then describe the three major contrasting approaches to the 
classification and chronology of the Indo-European languages. Following this, 

I will present the method used for the computational replication of sound 

change, which is a useful tool for testing the relative chronology of sound 

changes and may have some bearing on the grouping of the Indo-European 
languages. I conclude with some perspectives for future work. 


3.2 Sources and Origins 


Although most of the contemporary computational approaches to language 
classification and chronology stem directly from the methods used in evolu- 
tionary biology (cf. Dunn 2015), they seem to have their indirect roots in the 
earlier quantitative approaches which used statistical and mathematical 
methods to calculate different aspects of language change and comparison.” 


? Only the main approaches are outlined here. For details on the different statistical approaches and 
their history, I refer the reader to the overview of the use of statistics in historical linguistics by 
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This clearly stems from the ongoing search for more objective bases to support 
the traditional arguments from historical linguistics, which were often seen as 
subjective as they depended largely on the assumptions of the scholars who 
performed the research. 

One ofthe earliest quantitative approaches was a method devised by the Polish 
anthropologist Jan Czekanowski (1927), who tried to calculate the similarities 
between the Indo-European languages and present the results in a numerical way. 
Czekanowski approached the Polish linguist Jerzy Kurylowicz for a list of 
characteristic binary features which could be used in distinguishing the different 
subgroups of Indo-European (at first twenty, then twenty-two) and proceeded to 
create the two-feature contingency tables in which those features were counted 
for every language that was being compared. Then he used Pearson's tetrachoric 
correlation formula known from statistics. This allowed him to present a distance 
matrix ofthe Indo-European languages. His approach made its way to the United 
States through his student in anthropology, Stanistaw Klimek, who came to study 
with Alfred Kroeber in the 1930s. Czekanowski's method was adopted by both 
Alfred Kroeber and Charles Douglas Chrétien (1937), who tried to count the 
similarities between the Indo-European languages using a broader range of 
features (seventy-four in total) taken from Meillet's monograph on the Indo- 
European dialects (1922). However, their results were criticised (cf. Safarewicz 
1948) for being biased from the very start because of the use of data based on 
a work which itself intended to prove the groupings of, for example, Italo-Celtic 
(Meillet 1922). Probably the most famous approach was championed in the 
1950s by Morris Swadesh (1952) in the form of glottochronology. Although 
promising at first, it was harshly criticised for working on the initial assumption 
that there is a constant rate of change in languages. Embleton (1986) tried to 
further enhance the computerised methods of lexicostatistics and glottochronol- 
ogy and concluded that 


for the traditional methods as well as the statistical methods the reconstruction of the 
topology of the tree is more accurate than the assignment of dates. Reliable dating 
information is more likely to come from historical or archeological sources, although 
the statistical methods can provide some provisional estimates. (Embleton 1986: 169—70) 


3.3 Computational Approaches to Language Classification 
and Chronology 


As mentioned above, most of the contemporary computational approaches to 
language classification and chronology stem from the methods used in 


Sheila Embleton (1986) and the monograph on word lists and lexicostatistical approaches to 
language comparison by Brett Kessler (2001). For an overview and a summary assessment of 
both glottochronology and early lexicostatistics, see Tischler (1973). 
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biological sciences.? The computational approaches used in evolutionary biol- 
ogy were applied to linguistics in the late 1990s and began to be used in the 
early 2000s. They usually use statistical Bayesian inference to infer phyloge- 
nies. This kind of work has become very popular and has already been applied 
to different families of languages. 

An interesting distance-based approach was pioneered by Seren Wichmann 
and his team (2018) in the Automated Similarity Judgement Program and the 
corresponding ASJP database (Wichmann et al. 2018). The database includes 
a list of forty basic words for more than 5,000 languages and can be used, for 
example, to date when the languages in one family split away from each other. 
Because it uses the Levenshtein distance and lexical data, it is often regarded 
sceptically by linguists (cf. Greenhill 2011).* Additionally, there are some 
errors in the database itself. In the word list for Latin, for example, there is 
no vowel length present, and the words are transcribed inconsistently: wenire 
‘to go’ is transcribed with /w/ whereas viya ‘road’ is transcribed with /v/. 

One of the most controversial aspects of computational (or more accurately 
statistical) approaches to language classification and chronology is the fact that 
they are heavily based (often even exclusively based) on lexical data. In 
contrast, the standard procedure in traditional historical linguistics is the 
analysis of phonological and morphological features, and this is probably the 
main reason that many traditional historical linguists are generally very scep- 
tical about using computational approaches for language classification and 
chronology. As pointed out by Gerhard Jäger and Johann-Mattis List in their 
recent comparison of traditional and computational methods, the crucial differ- 
ence between the classical comparative method and the approaches adopted by 
computational historical linguistics is that “the comparative method strives to 
reconstruct the true history of languages in their entirety while statistical 
approaches search for probable or at least useful models of the observed 


patterns in some well-defined partial range of data”.° 


3.4 Application to Indo-European Studies 


Apart from the lexicostatistical approach employed by Dyen et al. (1992) using 
the 200-word Swadesh lists for ninety-five languages and assuming a similar 
rate of change in all languages, probably the most famous computational 


3 Foran in-depth overview of the methods, I refer the reader to Nichols & Warnow (2008) and, for 
a discussion of more recent studies in the area, to Dunn 2015. 

* “The Levenshtein distance is a simple distance metric derived from the number of edit operations 
needed to transform one string into another" (Greenhill 2011: 689). 

5 Gerhard Jäger & Johann-Mattis List, 2019, Statistical and computational elaborations of the 
classical comparative method (unpublished manuscript), https://bit.ly/3yVktOs (accessed 20 
February 2020): 30. 
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classification of Indo-European languages was developed in 2002 by a team of 
experts combining linguistics (Don Ringe, Ann Taylor) and computer science 
(Tandy Warnow) in the project on Computational phylogenetics in historical 
linguistics (with contributions from statistician Steven Evans).° Using 22 
phonological, 13 morphological and 259 lexical features as coded characters, 
they were able to produce a tree with a “perfect phylogeny” algorithm that 
tracked the branching of twenty-four ancient and medieval Indo-European 
languages. However, the phylogeny was not quite perfect since the position 
of Germanic could not be determined. As it turned out from subsequent work, 
which included language contact (Nakleh, Ringe & Warnow 2005), this was 
due to the fact that Germanic was apparently in contact with the other branches 
and therefore did not fit the “perfect phylogeny”. 

Probably the most controversial computational approach to the subgrouping 
and chronology of the Indo-European languages was that adopted by Russell 
Gray and Quentin Atkinson (2003). In their research, Gray and Atkinson used 
the word lists of basic vocabulary for eighty-seven Indo-European languages 
compiled by Dyen et al. (1992) along with Dyen et al.’s cognancy judgements 
and applied Bayesian inference to establish the dates for linguistic divergence 
of the languages analysed. They employed the algorithms for estimating the 
divergence time of DNA from evolutionary biology calibrated to the dates of 
the languages’ known split times. Using this technique, they were able to 
generate a tree in which the estimated dates of divergence of the particular 
groups of Indo-European languages were essentially in line with Colin 
Renfrew’s theory on the Anatolian origin of Indo-European languages 
(Renfrew 1987). The method was further expanded using phylogeographic 
approaches by Bouckaert et al. (2012), with the results also pointing to an 
Anatolian origin. 

The work of Gray & Atkinson (2003) and Bouckaert et al. (2012) was 
challenged by a team from the University of California, Berkeley (Chang 
et al. 2015). They tried to use the same method but with the addition of ancestry 
constraints, i.e. information relating to the fact that Latin is the parent language 
of Romance etc. Their research indicated that the chronology of the Indo- 
European splits was significantly shorter than previously thought and roughly 
in accordance with the dating of the so-called steppe hypothesis in archaeology 
as regards the homeland of the speakers of Proto-Indo-European. Although the 
authors claim that “the agreement between our findings and the independent 
results of other lines of research confirms the reliability of statistical inference 


€ | will outline here only the three main applications used for the classification, subgrouping and 
chronology of Indo-European languages as represented in Ringe et al. 2002, Gray & Atkinson 
2003 and Chang et al. 2015. I refer the reader to Pereltsvaig & Lewis 2017 and McMahon & 
McMahon 2006 for a more in-depth analysis. A summary and assessment of most of these 
approaches was also given by Ringe (2017b: 67-71). 
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of reconstructed chronologies" (Chang et al. 2015: 194), there are problems 
inherent in the Bayesian inference itself (cf. Ringe in Chapter 4 and the 
comments by Nichols & Warnow 2008: 785 on Bayesian methods). 

At present, it is difficult to say whether computational methods can give us 
a reliable chronology of the splits of the individual branches.’ It is clear, 
though, that they can provide us with a reliable indication of how the trees 
(topology) branched, as the work by Ringe et al. (2002) has shown. It is also 
striking that most of the approaches identify the traditionally assumed sub- 
groups (Indo-Iranian, Balto-Slavic, Italo-Celtic, Graeco-Armenian). Even with 
careful calibration, perhaps the best we can hope for is a very rough estimate. 
However, even this should be corroborated by linguistic, archaeological, gen- 
etic and historical data. 


3.5 Computational Replication of Sound Change (Computerised 
Forwards Reconstruction) 


Another approach which could yield promising results in the reconstruction of 
the Indo-European family tree, especially with regard to the relative chron- 
ology of sound changes, is the replication of sound change or computerised 
forwards reconstruction (cf. Sims-Williams 2018). The procedure of recon- 
structing forwards is not unknown in traditional historical linguistics. For 
example, Calvert Watkins (1962: 5) mentions it in his monograph on the Indo- 
European origins of the Celtic verb, where he employs the method to see how 
a Proto-Indo-European form would be regularly continued in Celtic. The 
method was also adopted by Ives Goddard (1998: 183) in his analysis of the 
Arapaho historical phonology, which, like the phonology of the insular Celtic 
languages, underwent some radical changes. 

The aim of such an approach, enhanced by using a computer, is to employ an 
algorithm that reads the given input data (proto-forms), makes the appropriate 
changes based on the changes that are usually assumed to have taken place in 
the development of the particular languages (regular sound changes) and 
generates the output which can be either manually or automatically compared 
with the actually attested forms. The purpose of this approach is to test the 
regularity of sound changes and their relative chronology. If the generated form 
is exactly the same as the attested one, then the relative chronology is assumed 
to be correct. 

However, only two programs known to the author have tried to replicate 
sound changes comprehensively by taking into account most of the changes 
(Maniet 1985; Hartman 2003). Most of the other programs have only used 


7 See also Chapter 4 by Ringe on the limits of computational cladistics. 
5 This is also sometimes called “historical derivation"; cf. Kondrak 2002: 12-15. 
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fragmentary data and applied only the main sound changes. I will list and 
discuss the most important ones below.? 

The first application of historical derivation was a program by Raoul Smith 
(1969) which applied twenty-one regular rules to 650 Proto-Indo-European 
reconstructed lexemes as taken from an etymological dictionary and, through 
the application of those rules, derived the modern Russian forms from the 
Proto-Indo-European ones. However, only nine lexemes were derived regu- 
larly because too few sound-change rules were applied. 

Further attempts were made by Burton-Hunter (1976) to generate Old 
French forms from their Latin sources and by Eastlack (1977), to produce 
Ibero-Romance from its Latin source. Both programs yielded a high percentage 
of expected forms since they took into account a larger number of rules within 
a shorter period of time (Latin to Old French or Old Spanish). 

Probably one of the most comprehensive computational sound-change rep- 
lication programs was the one devised by Albert Maniet (1985). He simulated 
the changes from Proto-Indo-European to Latin (252 rules) on a corpus of 
approximately 15,000 words from Plautus. However, his research went largely 
unnoticed by both linguists and computer scientists, and the program is, 
unfortunately, no longer accessible today. 

In more recent times, in his thesis on cognate alignment and reconstruction, 
Kondrak (2002: 141—3) included a short appendix on a Perl program which 
generated Polish forms from their Proto-Slavic sources. From the 626 lexemes 
taken into account, 72.5 per cent were regular. 

Hartman (2003) has been developing a program since the 1980s which 
simulates the sound changes from Latin to Spanish (with approximately 122 
rules and 1,806 coded vocabulary items) using sets of distinctive features (from 
Chomsky & Halle 1968) coded as binary strings rather than the usual string 
substitution as in most programs developed so far. As input, the etymon is “fed” 
into the computer via the keyboard or from a file. Then the individual letters are 
translated into sets of features, and changes are applied to the features in 
accordance with the programmed rules and their relative chronology. Finally, 
the features are translated back into characters and displayed using the 
International Phonetic Alphabet. 

Hartman also enabled the program itself to be used to model sound changes 
in other languages. Recently, the hard-coded version with the Spanish model 
was made available on-line, and a new working version of the program was 
presented (Hartman 2018) but with no significant modifications. The earlier 
version of Hartman's program was used by Towhid bin Muzzafar to simulate 
the changes from Proto-Algonquian to Shawnee (bin Muzzafar 1997). Apart 


? Talso refer the reader to the enumeration and short descriptions of the programs by Sims-Williams 
in his overview (2018). 
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from some additional examples that he was able to find for Shawnee and minor 
improvements in the relative chronology of changes, he pointed out an import- 
ant aspect of the computational replication of sound change which probably 
constitutes the main obstacle to this kind of work: 


It has been known that if diachronic sound change is regular, then it must be possible to 
demonstrate the regularity of sound change in computer models. But very few have 
actually ventured to take historical sound change rules from textbooks of well studied 
languages and develop a working computer model. And anyone who HAS ventured into 
this territory has quickly realized that there is a world of difference between the rules as 
they are written in standard linguistic notation and as they need to be written in computer 
models. (bin Muzzafar 1997: 73-4) 


Some more recent programs, apart from the ones developed by people to create 

fictitious languages (so-called “Conlangers”), include the simulation of 

Spanish from Latin by Marcel Schmuki (“ETYMO”, 2001), Proto-Germanic 

from Proto-Indo-European by Brett Kessler (“Derive”, 2004), Gothic from 

Proto-Indo-European by Roland Mittmann (2009) and the “Sound change 

transducers" by Amir Zeldes (2008). 

Sims-Williams tried some computational replication of sound change in 
Celtic historical phonology just by using the “Find & Replace" function in 
the word processor (Sims-Williams 2018: 562). By applying forty-three 
sound changes on the material of 159 selected Common Celtic forms, he 
was able to find amendments in the relative chronology of changes and 
identify usually overlooked Celtic cognates from a Proto-Indo-European 
root. He further argued that with modern programs, this kind of research 
could be pursued and expanded significantly so that it might include com- 
plete Proto-Indo-European reconstructions as input to find new etymologies, 
correct existing ones and check the relative chronology of sound changes 
(Sims-Williams 2018: 564). I agree with the author in principle, although 
I think implementing such an expanded approach is quite complicated, 
because of several key factors: 

1. circularity of the method — there are competing hypotheses on what the 
Proto-Indo-European reconstructed forms should look like, and the applied 
sound changes can be used to fit the proto-form 

2. problems in "translating" the sound changes into computational notation 

problems in coding the forms and characters as the input for the program 

4. lack of high-quality computational databases for Indo-European languages 
and Proto-Indo-European which would also include verbal and nominal 
paradigms 

5. lack of monographs and works that comprehensively present the complete 
relative chronology of changes from Proto-Indo-European to the attested 
languages 


ies) 
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6. not applying morphological changes which are usually assumed to have 
been very numerous in paradigms and which will inevitably blur 
the regular outcomes from simulations, especially if the simulation 
has a very large time span (e.g. from Proto-Indo-European to 
Latin etc.). 

To give just one example, if we take the Proto-Indo-European reconstructed 

form for ‘foot’ (cf. (Wodtko et al. 2008: 530 n. 2) and apply the sound changes 

that occurred in Attic Greek (cf. Liesner 2015: 110-15), we will get the 
following result: 


nom.sg. *pód-s > *pots > *pos (preserved in cpd. zpizoc 'tripod") 
gen.sg. *péd-s > *pets > *pes 


Apart from the fact that some forms do not match the attested ones 
because of not being included in the morphological changes, we will 
encounter problems with the reconstruction of the proto-form, since 
different scholars propose competing forms: nom.sg. *pöds, gen.sg. 
*pedés (Ringe 2017a: 59) or nom.sg. *pöds, gen.sg. *pedós (Clackson 
2007: 72). There are also problems with the assumed sound changes (cf. 
Szemerényi 1996: 116 for the view that *póds developed into *póss and 
Greek *pós). 


3.6 Potential Ways to Enhance the Computational Replication 
of Sound Change 


In this section, I will try to address some of the problems involved in computa- 
tional replication of sound change. 


3.6.1 Circularity and the Assumption of Different Reconstructed Input 
Forms 


Probably the easiest way to avoid circularity would be to test the various 
competing hypotheses, which would be fairly easy if the program were inter- 
active and allowed changes to the rules and the input forms. However, the 
correct form might still be different from any of those proposed so far. 
Therefore, what remains is to try to use an algorithm that reconstructs the proto- 
form or the one that would infer the changes (based on the ones that are known 
e.g. from Latin to Romance) rather than project them mechanically in 
a replicatory manner (cf. Anderson, List & Tresoldi 2018). The exact mechan- 
ism of this approach has not been fully presented yet, but if it turns out to be 
successful, it could prove an additional help to our understanding of linguistic 
changes. 
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3.62 . Problems in “Translating” the Sound Changes into Computational 
Notation 


This is one of the most important obstacles, since the linguistic changes cannot 
simply be used in computational algorithms but have to be "translated" into 
computational terms. That is, either as simple string substitutions as in most 
programs or with the use of distinctive features through a binary matrix or 
parsing. ^ Either way, only simple sound changes such as Latin rhotacism can 
be easily coded in terms of string substitutions. Problems occur with changes 
that operate only in certain positions in the word (since the computer does not 
know what a syllable is, so this requires additional coding) or, even worse, 
changes that depend on factors outside of phonetics and phonology (e.g. the 
apocope of final *-i in Latin, which occurs in verbs and adverbs but not in nouns 
and could be triggered by the final position of the verb in the sentence, cf. Hock 
2012, 1f correct). 


3.63 Problems in Coding the Forms and Characters as the Input 
for the Program 


Apart from translating sound changes into computational terms, the input also 
has to be coded in such a way as to encompass all the necessary features 
depending on the required changes (accent, prosody, co-articulation etc.). 
This, along with the mutual compatibility of the data, was recently addressed 
by List (2017), who proposed a universal format for coding etymological 
data. 


3.64 Lack of High-Quality Databases for Indo-European Languages 
and Proto-Indo-European 


This is also an important problem since the outcome of a computer simulation 
will essentially depend on the quality of the input and the programmed rules. 
There are hardly any reliable databases for Indo-European studies, but there are 
some recent projects (see Noyer 2016; Barnett & Macdonald 2018), which, 
once completed and made available, should remedy the situation to some 
extent. 


10 A string is understood as either a letter or a number, in this case only a letter. An example of 
a programmed sound change would be Latin rhotacism, where the string substitution would be 
programmed as follows: change every /VsV/ sequence into /VrV/ in all forms in the database. 
The vowel (V) would be defined as either /a(:)/, long or short respectively, /e(:)/, /i(:)/, /u(:)/, 
/o(:)/. The algorithm would then proceed to change every VsV sequence into VrV automatically 
and without exceptions, thus replicating what is usually thought of as an example of regular 
sound change. 
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3.65 | Lack of Monographs with the Complete Relative Chronology 
of Changes 


This is more of a problem for Indo-European studies in general than it is for the 
computational replication of sound change. There seems to be a lack of 
comprehensive works which present the complete (even hypothesised) relative 
chronology of changes. In most publications, only the chronology of main 
changes is given (even in e.g. McCone 1996) or the ones that are relevant to the 
discussion. This has been improving in recent years (cf. Matasovic 2005; 
Olander 2015: 46-67), but much work in this area remains to be done. 


3.6.6 | Not Applying Analogical Changes 


Whereas it is clear how to simulate regular sound changes, analogical 
changes are inherently irregular and occur quite unpredictably (see 
Olander 2015: 46). If that 1s so, it will be problematic to code them 
appropriately, other than as a substitution of the whole word within the 
chronology of changes: e.g. gen.sg. *pód-s — *pód-os with this change 
occurring only in this specific form. This can create problems if there are 
two similar forms in the paradigms and analogical change occurs in only 
one of them. In that event, morphological annotation of the forms would 
be necessary to avoid such situations. 


3.7 The Potential Use of Relative Chronology of Sound Changes 
in Subgrouping 


In an article devoted to the position of West Germanic, Don Ringe observed 
that 


the chronology of changes serves two purposes. On the one hand, languages are much 
less likely to have undergone innovations in the same order independently by chance. 
On the other hand, a sequence of changes should require more time to go to completion 
than a similar set of unrelated changes, thus ensuring that the period of linguistic unity 
demonstrated by the shared changes continued for a significant period of time. (Ringe 
2012: 33) 


If that is so, then it would be possible to use the computational replication of 
sound change in this area as well, depending on the quality and the availability 
of the data as discussed below.'* 


! Cf. the similar arguments made by Matasović 2005 for Balto-Slavic. 
'2 For an in-depth discussion of the particular subgroups, I refer the reader to the respective 
chapters in this volume. 
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3.7.1 Indo-Iranian 


The relative chronology of the main Indo-Iranian sound changes along with 
the approximate reconstruction of Proto-Indo-Iranian seem more or less 
established: Lubotsky (2018: 1885-6) gives a list of ten consecutive sound 
changes common to Indic and Iranian. Difficulties in the computational 
simulation include differing opinions between scholars on the place of 
Bartholomae's Law in the relative chronology of changes (was it also 
a Proto-Indo-European process or solely Indo-Iranian or perhaps two inde- 
pendent changes?) or the exact conditioning of Brugmann's Law and recon- 
struction of the proto-forms accordingly (e.g. the Proto-Indo-European 
reconstruction *h3éuis or *h;óuis ‘sheep’ depending on the assumptions 
made about the conditioning of Brugmann's Law and the absence of its 
operation in this word either due to the full grade of the ablaut or analogical 
change). 


3.7.2 Balto-Slavic 


There is considerable discussion about the relative chronology of Slavic 
and Baltic sound changes, but, although there are differences between 
scholars on the details and the exact relative chronology (cf. Matasović 
2005; Kortlandt 2008; Olander 2015), the main sound changes seem to be 
established, indicating the existence of a subgroup: Olander (2015: 47-53) 
lists eleven consecutive sound changes common to Balto-Slavic. Technical 
problems arise in the computational simulation with the "translation" of 
the changes in computerised terms concerning the Balto-Slavic accentu- 
ation and coding of the accent. Furthermore, there is a problem with the 
double reflex of Proto-Indo-European syllabic resonants in Balto-Slavic as 
either *iR or *uR, since the exact conditioning has to be stated for the 
computational simulation. 


3.7.3  Italo-Celtic 


For Italo-Celtic, sound changes seem relatively less important than morpho- 
logical changes (cf. de Vaan 2008: 7; Weiss 2020: 493—5). Weiss (2020: 207) 
gives a list of four consecutive sound changes that could be common to Italic 
and Celtic. There are also controversies concerning whether this stage existed 
at all — compare the arguments of Meiser (2003: 30-1) and the discussions by 
Schrijver (2006: 48-53) and also recently by Zair (2018). There is hardly any 
complete hypothesis on the reconstruction of Proto-Italo-Celtic (cf. Kortlandt 
2007: 149—78) or even a balanced account of Proto-Italic (cf. van der Staaij 
1995). 
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3.7.4 Graeco-Armenian 


The relative chronology of changes usually postulated for Ancient Greek and 
Armenian does not seem to support a Graeco-Armenian subgroup (cf. Kim 
2018). Mostly lexical items favour this grouping. It is also the weakest in the 
computational cladistic analysis by Ringe et al. 2002. 

From the point of view of the chronology of sound changes, only the Indo- 
Iranian and Balto-Slavic subgroups appear to be real entities. Proto-Indo-Iranian 
and Proto-Balto-Slavic also have more or less established reconstructions. 
However, in order to be able to fully investigate the relative chronology of 
sound changes, it would be necessary to compile a comprehensive list, have 
the changes translated as closely as possible from linguistic to computational 
notation and use a high-quality database with more or less complete data. 

Since it has long been a gold standard in historical linguistics that morpho- 
logical innovations should be taken more seriously into account than phono- 
logical or lexical ones, in the next section, I will discuss the possibilities and 
perspectives of including morphological changes in the computational replica- 
tion of sound change. 


3.8 Perspectives on the Inclusion of Morphological Changes 


It may be possible to expand the scope of the computational replication of 
sound change in such a way as to apply computationally generated sound 
changes along with morphological changes to the complete lexicon of the 
Proto-Indo-European language as it is reconstructed today in order to generate 
the main Indo-European languages. The program would apply the sound 
changes and the analogical changes in their relative chronology to the lexicon 
of Proto-Indo-European and generate output which in turn would be compared 
with the actual data relating to those languages. With this approach, the amount 
of regular sound change from a more or less complete lexicon would be 
uncovered along with the exact interferences causing irregularities — errors in 
the formulation, chronology or translation into computational terms of the 
programmed sound changes, borrowing and, especially, analogy. 

This approach could potentially address a very direct and practical question 
of interest to every practising historical linguist: whether one analogical solu- 
tion is more probable than another. The usual answer to this question depends 
on one's view of the system of the language in which the change occurred. 
However, different scholars might view a certain language system (in its earlier 
or even reconstructed phases) differently and so pose different analogical 
explanations with their own models and motives. They will look for parallel 
developments and typologically similar changes in the material they are work- 
ing on and, most importantly, in their previous experiences. We can deem one 
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solution of analogical remodelling as more plausible than another by providing 
other analogical solutions of exactly the same type, with similar models and 
motives along with an in-depth analysis of the synchronic situation. Warren 
Cowgill, in his work on universals in Indo-European diachronic morphology, 
noted that with regard to small-scale innovations “[a] sufficiently large collec- 
tion of such individual changes, appropriately classified, should give linguists 
measure of the relative plausibility of different solutions for problems in 
historical grammar" (1966: 115). He continues: 


At present each linguist judges the plausibility of a newly proposed solution pretty much 
by what he happens to remember of the morphologic innovations which during his 
career he has been led, for one reason or another, to accept as plausible. A reasonably 
objective standard of plausibility should make it easier for historical linguists to agree on 
solutions for problems of historical morphology that at present are still disputed. 
(Cowgill 1966: 115-116) 


Using a computational algorithm and an electronic database with word forms 
and the phonological and morphological changes which occurred in their 
development, it would be possible to create a virtually complete picture of 
the phonological and morphological system ofthe language at every stage of its 
development and to investigate any possible phonetic and analogical changes. 
Because the reconstruction of the proto-language and each of its stages of 
development remains hypothetical, its validity and accuracy can only be 
checked against the general typology of both synchronic language systems 
and types of diachronic changes along with the internal coherence of the 
system. Most notably, the compatibility of every single sound change can be 
checked against the hypothesis by applying it to the lexicon of the language. 

Such an approach would allow us to formulate hypotheses concerning the 
relative chronology and tendencies of sound change and analogical levelling 
based on fairly complete empirical data. The results would confirm or chal- 
lenge the existing theories on sound change and analogical remodelling and 
could form the basis for comprehensive historical grammars in the future 
which, with the expansion of integrated corpus linguistics, could encompass 
all corpora of texts from all periods of the documented language development. 
Such a large database would enable scholars to pursue further research in the 
area, allow the explicit discussion of competing hypotheses and serve as an 
educational tool. Additionally, the method itself could be applied to other 
language families, thus forming the basis for research on universal tendencies 
in language change. Moreover, it would break the so-called “handbook trad- 
ition” mentioned by Eichner (1992: 61), whereby a sound change is illustrated 
only by a handful of examples (usually the same in various historical gram- 
mars) and in order to find more of these, one has to consult an etymological 
dictionary. 
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In fact, a recent project carried out by Jouna Pyysalo (2017) has managed to 
achieve most of what is described above. With the use of computational 
simulation, the project aims to generate all the forms of the Indo-European 
languages from their reconstructed proto-forms. However, the author uses an 
idiosyncratic reconstruction of Proto-Indo-European which deviates signifi- 
cantly from the current communis opinio, 1.e. with only one laryngeal, at the 
same time basing his argument against the classical laryngeal theory only on 
Anatolian data and completely ignoring the data relating to Ancient Greek (cf. 
Janhunen & Pyysalo 2018). If the input proto-forms are hard-coded so that they 
cannot be modified, this project will only serve to present the author's own 
views on the subject. 


3.9 Perspectives on the Computational Methods 


Czekanowski, Kroeber and Chrétien pursued an interesting way of handling 
language classification back in the days when such methods were far from 
popular or even acceptable. They were often very harshly criticised by lin- 
guists, to the extent that virtually nobody followed their lead to improve the 
method. Just as Czekanowski, Kroeber and Chrétien were pioneers in the use of 
statistics for language classification applied to Indo-European, so were Ringe 
et al. and Gray and Atkinson pioneers in the use of computational methods in 
the same area. Even though their work 1s relatively recent, a large amount of 
new research has been done in the field, which has become so popular that some 
scholars argue that historical linguistics appears to have taken something of 
a quantitative turn. Indeed, new methods are being implemented in an attempt 
to meet the standards of traditional historical linguistics in paying careful 
attention to the annotation of the data and to detail in general, while at the 
same time taking advantage of the replicability, robustness and formality of the 
computational approach. Progress is being made in modelling language char- 
acteristics and change using algorithms, and more attention is being given to 
making the programs and the data openly accessible, in a more or less standard 
format, easy to use by non-computational scientists and, importantly, annotated 
jointly by experts in the respective fields. It seems that only through combining 
qualitative and quantitative methods can further progress be made in the field of 
Indo-European linguistics, and the current thinking is that such progress is only 
possible if scholars from different disciplines contribute collectively. 

The advantages of the computational approach are its speed, error-free 
processing of data and ability to handle large amounts of data. I would further 
argue that, thanks to the computational approach and the explicit presentation 
of the material, it would be much easier to compare different linguists’ com- 
peting hypotheses, even for people from outside the exact field of 
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specialisation, thus making the whole enterprise of Indo-European linguistics 
easily accessible for interdisciplinary studies. 
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4 What We Can (and Can't) Learn from 
Computational Cladistics 


Don Ringe 


For more than twenty years now various teams of colleagues have been 
pursuing computational work on the cladistics of Indo-European. I am partly 
to blame, since my collaboration with Tandy Warnow helped to make such 
research visible and attractive. To at least some observers, it has not always 
been clear that what we can learn from computational cladistics is limited. This 
chapter is an attempt to explore those limits. 


4.1 Outgroup Analysis 


I begin with a well-known principle of traditional cladistics that should be kept 
in mind as background for a consideration of computational methods, namely 
outgroup analysis. A simple example is given in Figure 4.1. 

The reflexes of PGmc. *tüna- ‘enclosure’ are always a-stems, reflecting pre- 
Proto-Germanic o-stems, but they are neuter in Norse and masculine in West 
Germanic, so the gender of the proto-form cannot be recovered by evidence 
internal to Germanic. The reflexes of the corresponding word in Celtic are 
always neuter, but the Old Irish word is an s-stem, while the Gaulish word is an 
o-stem — at least to judge from the Latinized form recorded in place names. 
Leaving that last problem aside (since this is just a demonstration of method), 
we would have to say that the gender, but not the stem class, of the Proto-Celtic 
form can be recovered by internal evidence. But if the two problems are 
considered together, the simplest solution is that the earliest recoverable form 
ofthe word was *dünom, a neuter o-stem, because that hypothesis requires only 


two changes: a shift of gender in West Germanic, and a shift of stem class in 
Old Irish. ' 


I am grateful to Bob Berwick and Tandy Warnow for helpful discussion of various parts of this 

chapter. All errors and infelicities are mine. 

! Note that this conclusion is valid regardless of whether the Celtic and Germanic words reflect 
common inheritance or early borrowing of a Celtic word into Proto-Germanic. I emphasize that 
because it is one illustration of an important point: subgrouping and establishing a genetic 
relationship (of words or languages) in the first place are different problems, and they cannot 
be solved by the same methods. 
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PGme. *tüna- c E >  PCelt. *dün... (neut.) 
X N 4 x 
" x " x 
ON (pl.) tún OE, OF, OS tin, OHG zün  (Latinized) Olr. dün 
(neut. a-stem) (masc. a-stem) Gaulish -dünum (neut. s-stem) 


(neut. o-stem) 


Figure 4.1 Outgroup analysis of PGmc. *tüna- and PCelt. *dün ... 


4.2 Computational Cladistics 


In this textbook illustration, we took the shape of the cladistic tree for granted as 
a basis for investigating another type of problem. Cladistics inverts that, using 
details of the linguistic data to find the true tree.^ Nevertheless, to a large extent 
(though not completely), a problem in computational cladistics uses the same 
mathematical principle as outgroup analysis. The most widely employed criterion 
for tree optimization — that is, for choosing the best of the trees that the software 
returns — is maximum parsimony: the optimal tree is the tree on which the smallest 
number of individual changes is required to account for the observed data. That is 
essentially the line of reasoning employed in the illustration above. Alternative 
criteria can be (and are) employed. For instance, the maximum compatibility 
criterion looks for the tree on which the greatest number of characters (that 1s, 
words or features) are “compatible” with the tree, i.e. the maximum number 
which fit the tree with no parallel development and no backmutation. In principle 
the two criteria are quite different. The maximum compatibility criterion can yield 
an optimal tree in which there is a great deal of backmutation and parallel 
development so long as it's confined to, say, 1 per cent of the words in the 
comparative wordlist; they can be as messy as you like, as long as there are only 
a few of them. Maximum parsimony yields the tree with the smallest amount of 
mess overall, regardless of how it's distributed. But in practice the two criteria 
usually give similar results, and if the amount of parallel development and back- 
mutation in a dataset is very small, the results of the two methods converge. 

Cladistics involves more than the inverse of outgroup analysis, however. 
For one thing, automation is necessary because of the sheer size of the 
problem. As the number of languages compared — in cladistic terms, the 
number of taxa — increases, the number of possible binary-branching trees 
that must be considered increases exponentially. If n is the number of taxa, 
the numbers of possible rooted and unrooted binary branching trees are given 
by the formulae in Table 4.1 (Dobson [no date]; Embleton 1986: 28-9 with 
references). 


? For an introduction to computational cladistics and the terminology that computational cladists 
use, see Nichols & Warnow 2008. 
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Table 4.1 Number of rooted and unrooted binary branching 
trees (n — number of taxa) 


Number of distinct rooted binary trees: (2n-3)2n-5):-:5:3:1 
Number of distinct unrooted binary trees: (2n-5)::-5:3:1 


If the problem under investigation is large enough to be interesting, it's not just 
that one human lifetime is too short to do the calculations by hand (though that 
can be true); it's also that the human mind can't keep track of all the possibil- 
ities. The computer, of course, can. 


4.3 Problems with Character Evidence 


Traditional cladistics is also beset by another problem that computers are 
ideally suited to solve. Consider the types of characters used in linguistic 
cladistics. Lexical characters (vocabulary) are actually the least reliable, 
because parallel semantic development is rampant — words meaning ‘person’ 
often come to mean ‘man’ and then ‘husband’, for instance — and undetect- 
able borrowing between closely related languages is a real problem. 
Moreover, we expect phonological and morphological characters to give 
a better picture of linguistic descent because they are grammatical, and 
grammar is acquired in native language acquisition in the first few years of 
life and resists modification later in life. But phonological and morphological 
characters have weaknesses of their own as well as strengths. Even if they 
are based on mergers (not simply on phonetic changes), phonological char- 
acters are usually "natural" and easily repeatable, making parallel develop- 
ment a significant problem (Ringe, Warnow & Taylor 2002: 66-7; Ringe & 
Eska 2013: 257-9); their strength is that mergers are irreversible, which 
means that the direction of a tree edge in time can be established. By 
contrast, changes in inflectional morphology are hardly ever repeatable in 
detail (except for loss of a morphological category or marker, which occurs 
often); but it is often difficult to figure out which state of a morphological 
character is original and which states are innovative. 

Of course there are traditional ways around these problems. Though the 
probability of any single sound change recurring independently is usually fairly 
high, the probability of a whole set of sound changes — especially an ordered 
set — recurring independently is far lower. The most distinctive sound changes 
that define the Germanic subgroup are a case in point. The following seven 
interrelated sound changes occurred in the prehistory of every well-attested 
Germanic language (Ringe 2017: 113-27, 147—50): 
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. PIE *p t k k"> fricatives */0 x x” unless an obstruent immediately preceded; 

. PIE *b d g g"> *p t k k” simultaneously with or after (a); 

. PIE breathy-voiced *b’ d^ g^ g"" > fricatives *B ð y y”; 

. *füsxx"7 *f ð z y y” if not word-initial and not adjacent to a voiceless 
sound and the last preceding syllable nucleus was unaccented (“Verner’s 
Law”); must have followed (a), which fed it; 

e. *BdOyy"> *b d g g” after homorganic nasals, and *d > *d also after */ and 
*z (at least); must have followed both (c) and (d), which fed it, and also (b), 
which it counterfed; further, *5 à > *b d word-initially, which likewise must 
have followed (b) for the same reason; 

f. stress was shifted to the first syllable of the word; this must have followed 
(d), because it both created and destroyed triggering environments for (d); 

g. unstressed *e > *; unless *r followed immediately; must have followed (f), 
which both fed and bled it. 

We have no basis for calculating the probability that each sound change would 

occur in a given line of descent within a given time period, but it turns out that 

that does not matter, because a Bayesian approach to probabilities will yield an 
overall result in the right ballpark. Let us estimate the probability of each sound 
change, do the relevant calculation, and try to assess the results (see Ringe & 

Eska 2013: 259-61). 

(a) or something very like it, occurred also in Armenian, thus in two of the ten 
well-attested subgroups of IE; let us therefore assign it a probably of 0.2 
(two in ten); 

(b) occurred also in Armenian and Tocharian, so we assign it a probability 
of 0.3; 

(c) might have occurred also in Proto-Italic (Meiser 1986: 38), so we assign it 
a probability of 0.2; 

(d) or a very similar change, occurred in fifteenth-century English (Jespersen 
1909: 199—208); given its complexity, we might assign it a probability 
of 0.1; 

(e) is commonplace (cf. the allophones of voiced obstruents in modern 
Spanish) and so should be assigned a high probability, say 0.5; 

(f) also occurred in Proto-Italic and Proto-Celtic, so we assign it a probability 
of 0.3; 

(g) is a common and repeatable merger, so a probability of 0.5 is again 
reasonable. 

Using these crude estimates, we can calculate the probability that all seven of 

these sound changes would occur in a single line of descent by chance as 


0.2 x 0.3 x 0.2 x 0.1 x 0.5 x 0.3 x 0.5 = 0.00009, or about one in 11,111. 


coco 


Of course, our estimates of the individual probabilities might be inaccurate. But 
because they are all between 0.1 and 1, the estimated cumulative probability 
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cannot be more than about an order of magnitude too small; it could easily be 
too large, in which case we are constructing an argument a fortiori. However, 
we are not finished with our calculation. We can establish several relative 
chronologies among these seven changes: 


(a) > (d) > (f) > (g) 
(a) > (b) > (e) 

(c) > (e) 

(d) > (e) 


Consider only the first and longest of those chronologies. The sound changes 
involved could have occurred in any order, yet they did occur in this one. The 
number of orders in which four events could occur is 4 x 3 x 2 x 1 = 24. To 
account for the fact that the changes occurred in only one of the possible 
orders we need to divide our above result by 24, yielding 0.00000375, or 
about one in 266,667. Since only about 7,000 human languages are attested, 
the fact that all these sound changes occurred, in the chronological order 
reconstructible, in the prehistory of every Germanic language can only mean 
that they occurred once, in the common ancestor of those languages. This is 
an overwhelming validation of the Germanic subgroup by sound change 
alone. 

To validate the Germanic clade, then, we do not need computational 
methods. Unfortunately not every potential clade offers us such clear 
phonological evidence; in effect, we got lucky with Germanic. Using 
characters based on inflectional morphology requires an even greater 
degree of luck: we need to find a shared morphological character state 
which, because of its details, is overwhelmingly likely to be an innov- 
ation. Once again Germanic is a case in point. The “weak” preterite bears 
a superficial resemblance to (1) the Gaulish ¢-preterite; (2) the Oscan -tt- 
perfect; (3) the Lithuanian imperfect in -davo-. But the details of all four 
formations are so different that they must have arisen independently. It 
follows that the weak preterite must be a Germanic innovation, and that 
too validates the clade. Some clades provide morphological evidence of 
that quality; unfortunately, many others do not. 

However, computational cladistics can extract the greatest amount of infor- 
mation from phonological and morphological characters by combining them. 
We use both sets to find the best unrooted tree; because the tree is unrooted at 
this initial stage of the investigation, the fact that we might not be sure which 
states of morphological characters are innovative is not a problem. Then we use 
the probative phonological characters, which are usually few, to root the tree, 
relying on the fact that mergers are irreversible. 
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In principle, then, computational cladistics should be able to solve any 
subgrouping problem for which there is enough clear evidence in the data. 
Unfortunately that condition frequently remains unmet. Still worse, many 
datasets present the researcher with conflicting evidence. There are at least 
two rather different reasons for that, conceptually distinct even though they 
shade into one another in practice. 


4.4 Phenomena Incompatible with Cladistic Trees 


On the one hand, it is possible that the diversification of a family of languages 
simply hasn't been treelike. In that case an appropriate algorithm will find 
several possible trees, but none of them will be very good by any optimization 
criterion, and each will be bad in a different way. Early in the line of work that 
resulted in Ringe, Warnow & Taylor 2002, we decided to find out what such 
a case would look like in detail. To that end we did a cladistic analysis of some 
modern West Germanic languages, with Danish and Swedish as an outgroup, 
using PAUP*, a program designed to find the most parsimonious tree (see 
above). We actually expected the analysis to fail, because it's clear that most 
West Germanic languages have been in contact, trading material and influen- 
cing one another, for as long as they've existed; in fact ocular inspection of the 
data shows that there are so many overlapping patterns of cognation that no 
perfect phylogeny (PP, ie. a tree in which no character exhibits parallel 
development or backmutation) can exist for this dataset. The computational 
analysis did fail spectacularly (and not only in ways that we had foreseen, 
because we hadn't paid enough attention to Scandinavian influence on English 
and Danish influence on North Frisian). Our results are given in Table 4.2. 
The best possible parsimony score is simply the number of state-to-state 
transitions within characters; if a PP had existed, that would have been its 
parsimony score. The best trees that we were able to find all exhibit more than 


Table 4.2 Best possible parsimony scores for West 
Germanic 
Best possible parsimony score for the data: 262 


Actual scores Tree assigned each score 

309 (Eng (WFris NFris)) (Neth HG) 
313 ((WFris NFris) (Neth HG)) Eng 
315 ((Eng (WFris NFris)) Neth) HG 
319 ((NFris HG) (WFris Neth)) Eng 
329 (NFris HG) (Eng (WFris Neth)) 
335 (((WFris HG) Neth) Eng) NFris 
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forty additional state transitions, reflecting either parallel development or 
backmutation. For technical reasons we cannot guarantee that the algorithm 
found the best available tree, so in principle we cannot exclude the possibility 
that a closer approximation to a PP for this dataset can be found, but in practice 
that is highly unlikely. It can be seen that the three least bad trees are plausible: 
to put it in terms that are in part anachronistic, the first groups Anglo-Frisian 
against Franconian, the second groups English against continental West 
Germanic, and the third groups Ingvaeonic against High German. But their 
parsimony scores are all mediocre, and numerous characters are incompatible 
with each tree. Still worse, the next three trees have only modestly less 
acceptable scores but are all implausible, since all three split the Frisian 
languages. This is what total failure, because the diversification of a family 
was not treelike, looks like." 

The other possibility is that there is a treelike signal in the data, but that it has 
been obscured by undetectable borrowing between the languages. There is 
probably more than one way to approach that problem, but the most straight- 
forward is to take several of the best trees and see how many “contact edges" 
you need to add to make a// the data compatible with the tree. Since each 
contact edge must represent a historical episode of language contact, they must 
be posited so as to be compatible with what is known about the geography of 
the languages in question and the relative chronology ofthe family's diversifica- 
tion events. Nakhleh, Ringe & Warnow 2005 is the only attempt to do that that 
I am aware of; interested readers should consult that work for further discussion. 

Tree-networks like these can arise in more than one way in the real world, of 
course. “Clean speciation" followed by renewed contact and linguistic borrow- 
ing that cannot be detected (because no crucial sound changes were involved) is 
one way. Another possibility is that the diversification of the family was 
actually network-like, but only non-adjacent members of the dialect network 
survive; in that case the lateral edges can represent innovations which spread 
through the dialect network as it was diversifying, and their sparseness is 
simply an artefact of the originally non-adjacent positions of the survivors. In 
general, cladistics cannot differentiate between those two scenarios. 


4.5 Time Depth in Linguistic Cladistics 


Thus far I have been discussing cladistics sensu stricto, i.e. the recovery of the 
branching tree that correctly reflects a language family's diversification. 
Numerous researchers have claimed that it is also possible to recover the 
approximate time in prehistory when each instance of diversification in a tree 


3 Fora good exploration of the ways in which language families can diversify, see Ross 1997, with 
exemplification in Ross 1998. 
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occurred. The most recent such claim was made by Russell Gray and his co- 
workers (first in Gray & Atkinson 2003) — and demolished by Andrew Garrett’s 
team at Berkeley (Chang et al. 2015). The easiest way to discuss the problems 
involved in dating linguistic divergences is to discuss Gray's work. 

Gray claimed that new and more powerful Bayesian cladistic methods 
yielded greatly improved trees and — more importantly — allowed researchers 
to recover the time depths of particular “speciation events" in the prehistory of 
language families with greater precision. He applied his methods to the Indo- 
European family (at first to bad data, but increasingly to competently vetted 
wordlists) and derived dates for PIE that are compatible with Colin Renfrew's 
“out of Anatolia" scenario (Renfrew 1987), but not with the “steppe hypoth- 
esis" (Anthony 2007, Anthony & Ringe 2014) that most Indo-Europeanists 
have long believed to be most probable. Both Indo-Europeanists and computer 
scientists were inclined to dismiss Gray's work from the start. For one thing, 
Bayesian cladistics is not in any way mathematically superior to methods 
already available; it is merely fashionable. For another, it is not inaccurate to 
say that Gray took already available data and cranked them through prefabri- 
cated software. But no one would have cared about that 1f the work had been 
cogent. Unfortunately, there were always multiple reasons to suspect that it 
couldn't be cogent, as follows (see also Pereltsvaig & Lewis 2015 for further 
extensive discussion). 

First, Gray used only lexical data, which are the least reliable for cladistics 
(Ringe, Warnow & Taylor 2002: 65 with references; Nakleh et al. 2005). 

Secondly, there is no lexical "clock" —that is, the replacement of vocabulary 
items does not proceed at an even approximately constant rate (Bergsland & 
Vogt 1962). Moreover, none ofthe other simplifying assumptions about the rate 
of word replacement holds up empirically. For instance, the "rates across sites" 
assumption sometimes encountered in biological cladistics — namely, that if 
one character evolves, say, half again as quickly in lineage A as in lineage B, 
you can count on other characters to do the same — clearly does not hold in 
language development. Gray (and others who have worked in linguistic cladis- 
tics, for instance the late Isidore Dyen; see Dyen, Kruskal & Black 1992) have 
suggested that that need not matter: 1f you let the assumed rate of change vary 
randomly around a mean, the result will be realistic. But it's not clear that even 
that is loose enough; and of course the wider the variation in rates of change, the 
more uncertain the hypothetical dates of proto-languages become. 

Thirdly, there are serious evidential problems which have an impact on the 
mathematics of trying to work backwards into prehistory. Steve Evans and co- 
authors laid out the problem in formal terms in their article of 2006, but it can 
also be stated informally (Bob Berwick, p.c.). To paraphrase Berwick, we want 
a theory that can infer backwards in time from a currently observed state so as 
to recover the dynamic processes that led to that state. In order to describe what 
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happened accurately, we need to know (a) the nature of the forces that have 
operated, (b) the magnitude of those forces, (c) the length of time over which 
they have operated, and (d) the initial state. In this case we are trying to derive 
(c), so we need to have all the other variables fixed. We linguists believe that we 
understand (a) well enough; but (d) is invariably full of gaps — there are some 
things about the proto-language that we simply cannot reconstruct because not 
enough evidence survives anywhere — and empirical observation shows that (b) 
varies within limits which are incompletely known but clearly wide. At least 
one further problem is the loss of data which can never be recovered, as 
follows. If a word x in a given meaning can be reconstructed securely for the 
proto-language, and if in the earliest records of a daughter it has been replaced 
by y, we know that at least one episode of replacement occurred in the 
unobservable prehistory of that daughter; we do not know whether only one 
or more than one occurred. Thus even if the rate of vocabulary replacement 
were more nearly constant, we could not use it to extrapolate into prehistory 
with any confidence. 

Fourthly, incorrect assumptions about the descent of particular languages in 
the tree can lead to unforeseen problems in calculating time depths. That was 
shown brilliantly by Chang et al. 2015. They noted that, while both Latin and 
various Romance languages were in the database of Gray's project, the algo- 
rithm was not informed that Latin was the ancestor ofthe Romance languages — 
and likewise with Sanskrit and modern Indic, and a few other, less substantial 
cases. The program thus returned a tree in which Latin was the sister of the 
Romance group, Sanskrit was the sister of modern Indic, and so on. The result 
was to lengthen the time depths calculated from the wordlists. Chang et al. 
introduced constraints forcing the program to treat Latin as the ancestor of 
Romance, etc. — and the time depths shortened dramatically, yielding a date for 
PIE compatible with the steppe hypothesis of Anthony and others and not with 
Renfrew's “out of Anatolia" hypothesis. Gray has protested that Classical Latin 
is not exactly the ancestor of Romance, but Chang et al. replied (correctly) that 
if all you're using is basic wordlists, the right question is whether the Latin 
wordlist is the ancestor of the Romance wordlists (so far as we can tell), and the 
answer is clearly yes (see the extensive discussion of Chang et al. 2015: 205-8). 
Of course this all illustrates the fact that 1f you want to pursue linguistic 
cladistics you need to have both a world-class linguist and a competent com- 
puter scientist on the team, but it also illustrates something else: the results of 
Bayesian cladistics are not robust; you can tweak one detail and get dramatic- 
ally different results. 

Finally, there is a further problem with Bayesian analyses, which was 
pointed out in a devastating paper by Bob Berwick (Berwick 2015, unfortu- 
nately still unpublished). Berwick noticed that the “higher” nodes in Gray's 
best tree had low bootstrap values, often no better than 20—30 per cent. Of 
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course, the alternatives all had even lower bootstrap values, so the tree 
presented could be called the “most probable" consensus tree; but 
a 30 per cent probability is just not probable enough — bootstrap values 
that low are unacceptable to a real computational cladist. Berwick ran 
appropriate software on Gray's data thousands of times and superimposed 
all the trees returned to give a visual impression of the problem; the top of 
the tree was a blur, with no resolution — and that remained true even when 
a million iterations were run. But if you can't be sure you have the right tree, 
it's not feasible to estimate divergence times. Unfortunately that applies to 
Garrett's results no less than to Gray's, since Garrett's team set out to 
replicate Gray's experiment. 

In fact, the dispute between Renfrew and most of our community has been 
resolved in favour of the steppe hypothesis, but neither by archaeologists nor by 
linguists; the crucial evidence is ancient DNA evidence. Haak et al. 2015 
demonstrated that there was a major population incursion from the steppes 
into Europe in the middle of the third millennium BCE — more or less exactly as 
the steppe hypothesis had posited — and that the distribution of steppe DNA 
correlates well with later populations known to have spoken Indo-European 
languages (see especially Mallory 1989). Those findings are irreconcilably 
inconsistent with Renfrew's scenario, according to which, Indo-European 
languages should have spread first from Anatolia to the Mediterranean lands 
and from there to northern Europe. That illustrates the most important conten- 
tion of this chapter: that information from all disciplines must be used, since 
any one source of information is inconclusive. 


4.6 Conclusion 


The general conclusion of this chapter is neither sweeping nor startling. We 
should use computational cladistics for what it's worth, but we need to be aware 
that its worth is limited. The general rule about extrapolating into the unob- 
served past still applies: results are comparatively secure when different lines of 
evidence converge on the same result. Computational cladistics yields only one 
line of evidence; therefore, it must be used in conjunction with traditional 
methods, archaeology, ancient DNA evidence and everything else that might be 
relevant. 
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5 Anatolian 


Alwin Kloekhorst 


5.1 Introduction 


The Anatolian branch consists of a group of languages once spoken in ancient 
Anatolia (modern-day Turkey) and northern Syria, with textual remains dating 
from the beginning of the second millennium BCE to the second century CE. It 
is commonly assumed that in the course of the first millennium CE, the entire 
Anatolian branch became extinct. The attested Anatolian languages are (in 
chronological order) as follows.? 

Kanifite Hittite:” a dialect of Hittite proper, which is known from hundreds 
of personal names and a handful of loanwords attested in Old Assyrian texts 
(clay tablets, written in the Old Assyrian version of the cuneiform script, dating 
to c. 1935-1710 BCE) mostly stemming from Kanis/N&Sa (modern-day 
Kültepe), Central Anatolia. 

Hittite (Hattusa Hittite”):* the main language of the administration of the 
Hittite kingdom, written in its own version of the cuneiform script, attested in 
some 30,000 fragments of clay tablets (dating to c. 1650-1180 BCE),” espe- 
cially found in the Hittite capital Hattusa (modern-day Bogazkale), but also 
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the mother tongue: The position of Anatolian in the dispersal of the Indo-European language 
family (NWO-project number 276-70-026) and the project Multilingualism and minority lan- 
guages in ancient Europe, funded by the HERA Joint Research Program "Uses of the past” 
(Horizon 2020). I would like to thank Xander Vertegaal and Stefan Norbruis for their useful 
comments on earlier drafts of this chapter. 

Kroonen, Barjamovic & Peyrot (2018) have recently claimed that a number of personal names 
that are recorded in texts from Ebla, dated to the twenty-fifth-twenty-fourth centuries BCE, 
belong to one or more languages “that clearly fall within the Anatolian Indo-European family" 
(2018: 6). However, no detailed analysis of this material is offered, and at present I therefore 
regard the linguistic status of these names as too uncertain to make any broad claims. 

See Kloekhorst 2019 for a full account of this language and its attestations. 

The authoritative synchronic grammar of Hittite is Hoffner & Melchert 2008. Synchronic 
dictionaries are HW? and CHD; etymological dictionaries are HEG, HED, EDHIL. For historical 
linguistic treatments, see e.g. Melchert 1994; Kimball 1999; EDHIL. 

But see Kloekhorst & Waal 2019, who argue that a few Hittite tablets may stem from the latter 
half of the eighteenth century BCE. 
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several other places in Central Anatolia. It is the best attested Anatolian 
language by far, and therefore the most important witness of this branch. 

Palaic:° known from several passages embedded in Old and Middle Hittite 
texts (sixteenth-fifteenth century BCE), primarily dealing with the cult of the 
god Zaparua. It was the language of the land of Pala, situated in the north-west 
of Central Anatolia. The Palaic corpus is small, and therefore many basic 
matters regarding grammar and lexicon are unclear. 

Cuneiform Luwian (also called Kizzuwatna Luwian):’ only known from 
cultic passages cited in Hittite texts (dating to the sixteenth-fifteenth century 
BCE). It was certainly spoken in Kizzuwatna (south-east of Central Anatolia) 
and possibly also in the western part of Anatolia. In Hittite texts from the New 
Hittite period (fourteenth-thirteenth century BCE), we find many Luwian 
loanwords, which traditionally were regarded to be Cuneiform Luwian as 
well but which may be more appropriately regarded as linguistically belonging 
to Hieroglyphic Luwian. 

Hieroglyphic Luwian (also called Empire Luwian / Iron Age Luwian):* 
closely related to Cuneiform Luwian, written in an indigenous hieroglyphic 
script (Marazzi 1998) that seems to have been especially designed for this 
language. Seals containing these hieroglyphs can be dated as far back as the Old 
Hittite period (c. 1600 BCE), but real texts (mostly inscriptions on rocks and 
stone steles) date from the thirteenth to the end of the eighth century BCE. The 
c. thirty texts that date from the last phase of the Hittite Kingdom (so-called 
Empire period, and therefore “Empire Luwian") are found all over Anatolia 
and northern Syria, whereas the c. 230 post-Empire period inscriptions (Iron 
Age, and therefore “Iron Age Luwian") are restricted to south-eastern Anatolia 
and northern Syria, the region of the so-called Neo-Hittite city states. Thanks to 
a boost in studies of the language since the publication of Hawkins 2000, 
Hieroglyphic Luwian has become one of the better-known Anatolian 
languages. 

Lydian:? the language of the land of Lydia (central western Anatolia), 
written in its own version of the Greek alphabet, attested in some 120 texts 
(the bulk of which are inscriptions on stone steles), dating from the eighth to the 
third century BCE (with a peak in the fifth-fourth century BCE). Our 


a 


For texts, grammar, vocabulary and historical phonology, see e.g. Carruba 1970; Kammenhuber 
1969; Melchert 1994: 190—228. 

Texts are collected in Starke 1985. For grammatical treatments, see Starke 1990; Melchert 2003. 
For the lexicon, see Melchert 1993. See Yakubovich 2010 for the term “Kizzuwatna Luwian”. 
Texts can be found in Hawkins 2000, see also ACLT. For grammatical treatments, see Melchert 
2003; Payne 2010; Yakubovich 2015. A good Hieroglyphic Luwian dictionary is a desideratum: 
Meriggi 1962 is largely outdated, and the lexical part of ACLT can only be used with caution. 
For texts, grammar and vocabulary, see Gusmani 1964. Historical linguistic treatments can be 
found in Melchert 1994: 329-83; Gérard 2005. A more general introduction to the Lydians and 
their language is Payne & Wintjes 2016. 
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knowledge of Lydian is limited since there are only a few bilingual texts and 
since its vocabulary is difficult to compare to the lexicon of the other Anatolian 
languages (see also below, Section 5.3.3). 

Carian: ? the language ofthe land of Caria (south-central western Anatolia), 
written in its own version of the Greek alphabet, attested in some 200 inscrip- 
tions from the seventh-fifth century BCE from Egypt (tomb inscriptions from 
Carian mercenaries living there) and from the fourth-third century BCE from 
Caria itself. Our knowledge of Carian is very rudimentary: the Carian alphabet 
was not successfully deciphered until the 1990s, and many inscriptions contain 
personal names only. 

Lycian (also called Lycian A):'! the language of Lycia (south-western 
Anatolia), written in its own version of the Greek alphabet, in some 150 
coin legends and 170 inscriptions on stone, dating to the fifth-fourth 
century BCE. Our knowledge of Lycian is relatively advanced, partly 
because of some bilingual texts (including the large trilingual inscription 
of Letóon) and partly because of its linguistic similarities with the Luwian 
languages. Nevertheless, many details regarding grammar and lexicon are 
still unclear. 

Milyan (also called Lycian B):'” attested in two inscriptions from Lycia 
(fifth century BCE) that are written in the Lycian alphabet. Although the name 
“Milyan” refers to the region Milyas, situated in the north-east of Lycia, it is 
unclear where it originates. The two Milyan inscriptions, which both seem to be 
in verse, are difficult to understand, and our knowledge of Milyan is therefore 
rudimentary. 

Sidetic:'* the language of the city of Side (south coast of Anatolia) and its 
surroundings, written in its own version of the Greek alphabet, attested in some 
ten inscriptions on coins and stone, dating to the fifth-second century BCE. The 
number of textual remains is very low, so we only know a few facts about 
Sidetic grammar and lexicon. 

Pisidian:'* a language attested in a few dozen tomb inscriptions in the Greek 
alphabet that were found in the eastern part of classical Pisidia (south-west of 
Central Anatolia), dating to the first-second century CE. The inscriptions 
contain only personal names, some of which point to an Anatolian character 
to this language. 


See Adiego 2007 for a full discussion ofall Carian texts, and the grammar, lexicon and historical 
linguistic interpretation of the language. 

For text editions see Kalinka 1901; Neumann 1979; Laroche 1979. The vocabulary is compiled 
in Melchert 2004 and Neumann 2007. Grammatical treatments and historical linguistic analyses 
can be found in e.g. Hajnal 1995; Melchert 1994: 282—328; Melchert 2004; Kloekhorst 2013. 
12 Shevoroshkin 2013. The Milyan vocabulary is included in Melchert 2004 and Neumann 2007. 
13 Pérez Orozco 2007. — ^ Brixhe 1988. 
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5.2 Evidence for the Anatolian Branch 


There is ample evidence to view the Anatolian languages as forming a single 
branch: they share enough linguistic features to set them apart as a single 
group vs. the rest ofthe Indo-European language family, cf. e.g. Rieken 2017: 
299. A complicating factor, however, is the Indo-Anatolian hypothesis, which 
states that Anatolian was the first branch to split off from the Indo-European 
mother language, after which the remaining language, which was to become 
the ancestor language of all the non-Anatolian Indo-European languages 
(*Core Proto-Indo-European") underwent a set of innovations (see 
Section 5.5). Whenever the Anatolian languages show shared features that 
are different from the other Indo-European languages, we should therefore 
investigate to what extent these differences are caused by innovations that 
took place in the prehistory of the Anatolian branch, or by innovations in the 
prehistory of Core Proto-Indo-European. In the latter case, the Anatolian 
features may in fact be shared retentions, and therefore cannot, strictly 
speaking, be used in arguing that the Anatolian languages form a single 
branch. In practice, however, it is not always easy to distinguish between 
the two. 

Another complicating factor is that some of the Anatolian languages have 
a very limited attestation (especially Sidetic and Pisidian) or are in general 
poorly understood (Carian, Milyan and, to a lesser extent, Lydian and Palaic). 
This means that not all features listed below are found in all languages. 

The following specific features of the Anatolian languages can be regarded 
as examples of common innovations that prove the unity of the Anatolian 
branch and allow for the postulation of an ancestor language, Proto- 
Anatolian, from which they all derive: 


Phonology 

* the merger of PIE mediae and aspiratae into a single series that is called lenis 
(PIE *d, *d"> PAnat. */t/),'° which is distinct from the so-called fortis series, 
which is the outcome of PIE tenues (PIE *t> PAnat. */t/) | 

* the operation of Eichner's lenition rules: (1) pre-PAnat. *VC:V > PAnat. 
*VCV and (2) pre-PAnat. *V ... VC:V > PAnat. *V... VCV'® 


15 As argued in e.g. Kloekhorst 2016: 226-8, within the glottalic theory this merger may be seen as 
the result of the development of PIE mediae, which can be interpreted as pre-glottalized lenis 
stops, e.g. PIE *d = *['t], into a biphonemic pair of glottal stop + lenis stop: PIE *d = *['t] > pre- 
PAnat. *[?t] = */2/ + */t/. In this way, the oral part of the PIE mediae was detached from its 
glottal part and merged with the PIE aspiratae, which in fact were lenis stops (e.g. PIE *d* = 
*[t]), whereas the glottal stop merged with the outcome of PIE */;. 

16 Eichner 1973: 79, 100 n. 86. The two lenition rules can in fact be regarded as a single 
development, which may be represented as pre-PAnat. *V(. . .)VC:V > PAnat. *V(. ..)VCV, cf. 
Adiego 2001; Kloekhorst 2014: 547-87. 
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* PIE accented short *ó was lengthened to PAnat. long */ó/ (and subsequently 
caused lenition according to Eichner’s first lenition rule) 

* the PIE cluster */5u yields PAnat. monophonemic */q:/,' e.g. PIE *trh;u(e)nt- 
> PAnat. */trq".(o)nt > Hitt. tarhuuant- Aory*:ont-/, CLuw. tarhu(ua)nt- 
Atory*:(o)nt-/, Lyc. trqqàt- / trqqrit- /trk*(a)t-/, Car. trgó- Arkwt-/ 

* the development ofa lateral in the word for ‘name’: PIE *h neh mn- > PAnat. 
*/2lömn-/ > Hitt. Jaman, HLuw. dlaman-, Lyc. alàma- — 


Morphology 

e the creation of an acc.-dat. form */?m:u(-)/ ‘me’ (vs. PIE *h,mme-) 

* the creation of a demonstrative pronoun */2opö-/ (from virtual PIE */;0-b/ó-) 

e the loss of the distinction between present and aorist (the “sezzi-principle”)”" 

* the creation of the hi-conjugation (cognate to the PIE perfect)! 

* the Ipl. ending */-uén(i)/ (cognate to the PIE dual ending *-ué)? 

e the replacement of the post-consonantal pret.act.3sg. ending *-t by the 
middle ending *-to (> Hitt. -tta, CLuw. -tta, HLuw. -ta, Lyc. te)” 

* the loss of the subjunctive and optative moods. 


For other specifically Anatolian features, see Section 5.5, where a list of shared 
retentions of Anatolian will be presented (as arguments in favour of the Indo- 
Anatolian hypothesis). 


5.3 The Internal Structure of Anatolian 


There is some debate on the exact internal subgrouping of the Anatolian 
branch, although on some aspects there is broad consensus. 


5.3.1 The Luwic Branch 


There can be no doubt that Cuneiform Luwian, Hieroglyphic Luwian and 
Lycian form a separate branch, which is commonly called “Luwic”. This 


17 Kloekhorst 2014: 439-59. 

18 Kloekhorst 2006: 102; Melchert 2011: 128-9. Cf. Kloekhorst 2018a for the postulation of 

a labio-uvular stop /qv:/ for the PAnat. stage. 

EDHIL: 192. 7° Malzahn 2010: 267-8. 

a E.g. Eichner 1975; Kloekhorst 2018b, contra Jasanoff 2003. 

22 Jasanoff 2003: 3; EDHIL: 1001. 

?5 The idea that CLuw. -tta, HLuw. -ta and Lyc. -te reflect the middle ending *-to is generally 
accepted (e.g. Yoshida 1993), but the origin of Hitt. -tta is debated. Some scholars assume that 
the spelling °C-ta can represent /°Ct/ < PIE *?C-t (e.g. Yoshida 1991: 28); others assume that the 
a-vowel is real and developed as a prop-vowel, i.e. /?C-to/ < PIE *°C-t (e.g. Melchert 1994: 
175-6, with references); and still others have argued that the a-vowel is real but cannot be 
explained as a prop-vowel, and that Hitt. -tta therefore must reflect earlier *-to (EDHIL: 800-1). 
If the latter view is correct, the spread of *-/o at the cost of post-consonantal *-/ must have been 
a common innovation of all Anatolian languages. 
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means that these languages derive from a “Proto-Luwic” mother language. It is 
generally assumed that Milyan and Carian belong to this branch too, and also 
Sidetic and Pisidian are often regarded as possibly Luwic languages (e.g. 
Melchert 2003: 170-7; Yakubovich 2010: 6; Rieken 2017: 301-3). 


5.3.1.1 Shared Innovations of the Luwic Languages The Luwic sub-branch 
of Anatolian can be defined through the following innovations (although they 
are not always attested in all languages): 


Phonological 

* the assibilation of PAnat. */k:/ > PLuwic */ts/ > CLuw. z /ts/, HLuw. z /ts/, 
Lyc. s, Mil. s, Car. s, Sid. s (vs. Hitt. /k:/, Pal. /k:/, Lyd. k) 

* the weakening of PAnat. lenis */k/ > PLuwic *i > @: e.g. PAnat. */kés:r-/ 
‘hand’ > CLuw. is(Sa)ri-, HLuw. istri-, Lyc. izri- (vs. Hitt. /k/ and Pal. /k/) 

* the weakening of PAnat. lenis */kv/ > PLuwic *u: e.g. PAnat. */kvóu-/ ‘cow’ 
> HLuw. wawa/i-, Lyc. wawa-, uwa- (vs. Hitt. /kv/, Pal. /kv/, Lyd. k) 

e the merger of PAnat. */e/ and */6/ into PLuwic */3/** (vs. their retention as 
separate phonemes in Hittite, Palaic and probably Lydian») 

e Cop's Law: PAnat. *VCV > PLuwic *VC:V"° 


Morphological 

* the large-scale spread of the proterodynamic i-stem inflection replacing ori- 
ginal consonant stem and o-stem inflection (formerly called “i-mutation”)”’ 

e the reshaping of the PAnat. nom.pl.c. ending *-es to PLuwic *-Vns-i (based 
on the acc.pl.c. ending *-Vns < PIE *-V-ms + the original pronominal nom.pl. 
c. ending *-i < PIE *-o7?) > CLuw. -Vnzi /-Vntsi/, HLuw. -V-zi /-Vntsi/, Lyc. 
-i (€ *-insi), -êi (< *-onsi), -ài (< *-ansi), Car. -š (?) 

e the grammaticalization of the genitival adjective in *-0s:0/i- > CLuw. -assa/i-, 
HLuw. -asa/i-, Lyc. -ehe/i-, Mil. -ehe/i-, Car. -š (?), Sid. -as V, Pis. -s (7)? 


?* The development of the Carian vowel system is too poorly understood for us to be certain that 


Carian was part of this development. If it was not, this isogloss should be removed from the 
inventory. 

Note that the prehistory of the Lydian vowel system is still relatively unclear. 

Cf. Kloekhorst 2014: 567—85 for the fact that this law is not only valid in CLuwian (for which it 
was originally formulated, cf. Cop 1970), but also in HLuwian and Lycian. 

See Norbruis 2021: 9-50 (adapting Rieken 2005) for a full treatment of the phenomenon that in 
the prehistory of the Luwic branch the proterodynamic i-stem inflection (synchronically 
characterized by the presence of an -i- in the nom.sg./pl.c. and acc.sg./pl.c. cases vs. the absence 
of -i- in all other cases, therefore termed ";-mutated"), which it had inherited from PIE, spread 
widely within the nominal system, first to consonant-stems, and later to *o-stems (but not to *d- 
and *u-stems). See below for the fact that Palaic and Lydian also show some cases of this spread. 
This suffix is attested in Hittite, too, but it has not been grammaticalized as an inflectional 
morpheme, cf. EDHIL: 216. 
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* the spread of the pret.act.3sg. ending *-to to verbal stems ending in a vowel 
(at the cost of the original ending *-/)^? 


Lexical 

* PLuwic *mas:Vn- ‘god’ > CLuw. massani-, HLuw. DEUS-ni-, Lyc. mahana-, 
Mil. masa-, Car. mso-, Sid. masara- (vs. PAnat. *tieu- (« PIE *dieu-) in Hitt. 
Siu-, Lyd. ciw-)” 

* PAnat. *r:rg”:(a)nt- ‘(one who has / has been) conquered’ develops into the 
generic name for ‘Storm-god’ in PLuwic, yielding CLuw. tarhu(a)nt-, 
HLuw. tarhunt-, Lyc. trqqrit-, Mil. trqqrit-, Car. trqó-, all ‘Storm-god’ (vs. 
Hitt. tarhuuant- ‘conquered’ and “ISKUR-unn(a)- ‘Storm-god’)*! 


5.3.1.2 Internal Subgrouping of the Luwic Branch The relationships 
between the three better-known Luwic languages — Cuneiform Luwian, 
Hieroglyphic Luwian and Lycian — are quite clear. It is generally accepted 
that Cuneiform Luwian and Hieroglyphic Luwian are closely related, yet 
distinct, dialects. The relationship between the two cannot have been a matter 
of one of them deriving from the other (cf. Melchert 2003: 171—2), which 
means that both must go back to a common ancestor, which may be termed 
Proto-Luwian. 

Lycian is generally recognized as being closely related to the two Luwian 
languages. Yet, although it was attested almost a millennium after the latter's 
first attestations, it was clearly not a direct daughter language of either of them: 
the Luwian languages show innovations that are not shared by Lycian (e.g. 
merger of PLuwic */o/ and */a/ into PLuwian */a/; replacement of the dat.-loc. 
pl. ending */-os/ (< PAnat. */-os/) by */-ons/** > PLuwian */-ants/ (CLuw. 
-anza /-ants/, HLuw. °a-za /-ants/); fricativization of */q:/ to PLuwian */y:/ 
(Kloekhorst 2018a: 73—6)). This means that Lycian stems from a sister 
language to Proto-Luwian and that both can be regarded as distinct daughters 
of Proto-Luwic. 

Although our knowledge of Milyan is limited, it is usually seen as being 
closely related to Lycian. This is based on the fact that these two languages have 
several linguistic features in common, which may be seen as shared innovations 
that set them apart from Proto-Luwian: PLuwic *- Vs > Mil., Lyc. -V (as in nom. 


?9 Whereas Hittite still shows the old opposition between post-consonantal *-to and post-vocalic 
*-t. Palaic has also retained post-vocalic *-t (e.g. pret.act.3sg. /u-ki-i-it). The origin of Lyd. pret. 
act.3sg. -/ is not fully clear, unfortunately. 

Eichner (1974: 64) proposed that Luw. *mas:Vn- ‘god’ derives from a pre-form *meh ;/3-(0)s-h3on- 
“freien Willen habend, nach eigenem Belieben handelnd" (- Lat. mos ‘custom, usage’, which 
Eichner translates as *Wille"). An alternative may be to derive *mds:Vn- from *mehjns-en-, 
a derivative of PIE *meh;-ns- ‘moon’. 

Although it cannot be excluded that already in PAnat. *t:rq":(2)nt- was the name of the Storm- 
god, and that Hitt. *ISKUR-unn- is an innovation, cf. Kloekhorst 2019: 192. 

?? Having taken over the nasal of acc.pl.c. */-ns/ « PIE *-ms. 
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sg.c.) (vs. PLuwian *-Vs); PLuwic *-Vn > Mil., Lyc. -V (as in acc.sg.c.) (vs. 
PLuwian *-Vn); a-umlaut (e.g. Mil. nom.-acc.pl.n. uwadra vs. uwedr(i)-, or 
nom.-acc.pl.n. yuzruwdta vs. acc.pl.c. yuzruwétiz); syncope in the ethnicon 
suffix Mil. -wñni- and Lyc. -nni- < *-wn:i- (vs. PLuwian *-wan.i-); fronting 
of PLuwic */kv:/ before a front vowel in rel.pron. */kv:i-/ > Lyc. ti-, Mil. ki- /ci-/ 
(vs. PLuwian *Kk"i-). A shared lexical innovation may be Mil. kibe ~ Lyc. 
tibe ‘or’. 

The position of Carian, Sidetic and Pisidian is less clear, since the number of 
possible isoglosses is very low. In the case of Carian, Adiego (2007: 347) states 
that “a meaningful isogloss shared by Carian and Milyan is the copulative 
conjunction Car. sb, Mil. sebe ‘and’”, which contrasts with Lyc. se ‘and’. One 
may add Car. mso- ~ Mil. masa- vs. Lyc. mahana- ‘god’. In the case of Sidetic, 
the dat.pl. ending -a (in masara ‘to the gods’), which must reflect PAnat. */-os/, 
shows that this language does not belong to the Luwian subgroup (which rather 
shows the dat.pl. ending */-ants/). Furthermore, this ending shows that Sidetic, 
just like Lycian and Milyan, has undergone the development *-Vs > -V, which 
may be seen as a shared innovation. On the basis of the Sidetic conjuction sa 
‘and’, we may assume a closer affinity with Lycian, which has se ‘and’ (vs. Mil. 
sebe and Car. sb). In the case of Pisidian, a closer affinity with the Lyco-Carian 
subgroup may be seen from the nom.sg.c. ending -V, which then corresponds to 
Lyc. -V, Mil. -V, Car. -Ø < PLuwic *-Vs (vs. CLuw. -Vš and HLuw. -Vs). The 
exact position of Pisidian within this group must remain undetermined, 
however. 

All in all, the tree of the Luwic sub-branch may be envisaged as in Figure 5.1. 


5.3.1.3 Dating Proto-Luwic The Luwic branch seems to have been rela- 
tively shallow. As mentioned above, the linguistic difference between 


... Pisidian 
E 2 Sidetic 
29 
Lycian 
Milyan 
: , Lyco-Carian | Carian 


l Hieroglyphic Luwian 
Luwian Cuneiform Luwian 


Figure 5.1 The Luwic sub-branch of Anatolian 
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Cuneiform Luwian and Hieroglyphic Luwian is minimal, and we may therefore 
date their pre-stage, Proto-Luwian, to not much more than a handful of gener- 
ations before the oldest attested Cuneiform Luwian texts (sixteenth century 
BCE), i.e. to c. the eighteenth century BCE. In the same vein, the difference 
between the Lyco-Carian branch and Proto-Luwian seems to have been rela- 
tively small, so we may assume that Proto-Luwic preceded Proto-Luwian by no 
more than two or three centuries. We can thus approximately date this stage to 
the twenty-first-twentieth century BCE. 


5.3.2 The Position of Palaic 


Since our knowledge of Palaic is limited, it is not easy to determine its position 
within the Anatolian language family with certainty. Moreover, as Carruba 
(1970: 4) and Melchert (2003: 269) show, Palaic shares linguistic features both 
with Hittite and with the Luwic languages, adding to the difficulty. 
Nevertheless, Oettinger (1978) gives several arguments that would indicate 
that Palaic is more closely related to the Luwic languages than to Hittite and 
Lydian. According to Rieken (2017: 303), however, “none of the isoglosses 
suggested so far [i.e., by Oettinger and others — AK] involve newly created 
morphology. In each case, the change consists of a choice among several 
inherited morphemes or a shift of a category's function, mostly extending it", 
and she therefore remains agnostic about the position of Palaic. To my mind, 
this 1s too negative a view: there certainly are some features that in fact can be 
used for judging its place in the Anatolian tree. 

* The Palaic dative of the 3sg. enclitic pronoun, =tu ‘to him/her’, is identical to 
CLuw. =tu and HLuw. =du /=ru, but distinct from Hitt. =sse (later =ss7) and 
Lyd. =mA. Oettinger (1978: 78-9) convincingly argues that this =tu origin- 
ally was the dative form of the 2sg. enclitic pronoun, which was extended to 
the 3rd person. This non-trivial development was thus a shared innovation of 
Palaic and the Luwian languages." 4 

* In Palaic, Proto-Anatolian lenis */kw/ (< PIE *g"(?) is weakened to /x"/ or /y"/ 
in ahuudnti ‘they drink’ < *h;g""énti. This fricativization may be seen as 
a first step towards the full weakening that is found in Luwic, where PAnat. 
*/kv/ > *y 4 

* According to Starke (1990: 71-5), Palaic shows some instances of “i-muta- 
tion”, indicating a connection with the Luwic branch. Since it has now 
become clear that the “i-mutation” inflection in fact goes back to a normal 


33 Note that the corresponding Lycian morpheme is =i(je), which then must be a later innovation 
through analogy after the nominal dat.sg. ending -i(je). 

?* If Watkins’ suggestion (apud Melchert 1990: 207) that Pal. kuwani- means ‘womanly’ (i.e. from 
the PIE stem *g"en-/-) is correct, it would show that this weakening did not take place in word- 
initial position. 
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PIE proterodynamic i-stem inflection (cf. footnote 27), the mere existence of 
this type in Palaic is not remarkable per se. However, as noticed by Starke, in 
Palaic the “i-mutated” inflection also seems to be found in original conson- 
ant-stems (e.g. ?ilaliant(i)-). This implies a secondary spread that is compar- 
able to the one found in the Luwic branch, and which may then be viewed as 
a shared innovation. Nevertheless, the fact that our evidence for “i-mutated” 
stems in Palaic is scanty shows that this spread certainly had not yet taken 
place on such a large scale as in the Luwic languages. 

e [n Palaic, the pret.act.3pl. ending is -(a)nta, which matches Luwic *-Vnto 
(CLuw. -anta, HLuw. -anta, Lyc. - Pte), ° but contrasts with Hitt. -er and Lyd. 
-rs / -ris. Since *-Vnto is generally regarded as deriving from the PIE 3pl. 
middle ending *-ento, it may be possible to see the transfer of this ending to 
the pret.3pl. of the active as a common innovation of Palaic and Luwic.”° 

e [n Palaic, the only attested pret.act.Isg. ending is -(h)ha, which reflects 
PAnat. */-q:a/ < PIE *-A;e, and thus originally belonged to the 
hi-conjugation. Since it is also found in the form aniehha ‘I did’ (thus 
Carruba 1970: 50), which was probably originally mi-conjugating, it seems 
that in Palaic the pret.act.1sg. hi-ending -(h)ha has fully ousted the corres- 
ponding mi-ending *-m (attested in Hitt. -un, -nun and Lyd. -v). The same 
development took place in Luwic, where pret.act.1sg. *-g(:)a (CLuw. -(h)ha, 
HLuw. -ha, Lyc. -ya, -ga) has fully ousted *-m as well. We may thus assume 
that Palaic and Luwic shared this innovation." 

Although the material is scanty and the number of arguments low, it does seem 

safe to conclude that Palaic shares some innovations with the Luwic branch. 

Nevertheless, it is clear that Palaic cannot be regarded as a proper Luwic 

language: for instance, it does not show assibilation of PAnat. */k:/ (which 

rather yielded Pal. k; Melchert 1994: 210), and it does not show a nom.pl.c. 
ending *-Vnsi (but rather -aš and -es). We should therefore assume that Luwic 
and Palaic are related on a higher node, which may be termed Luwo-Palaic.^* 


?5 Yoshida’s scenario, by which the Palaic ending -(a)nta has a different origin from the Luwic 
ending *-Vnta < PIE *-Vnto (Yoshida 1991: 370-1), seems too complicated to me. 

We may also assume that this transfer took place as early as in pre-Proto-Anatolian times, and in 
fact consisted of the replacement of the original pret.act.3pl. ending *-Vnt < PIE *-(e)nt by its 
middle variant *-Vnto < PIE *-(e)nto in a reaction to the loss of word-final *-t, just as pret. 
act.3sg. *-t was for the same reason replaced by middle *-to (cf. Section 5.2). This would fit the 
fact that a pret.act.3pl. ending *-an cannot be reconstructed for Proto-Anatolian (contra Yoshida 
1991). If correct, we have to assume that Proto-Anatolian, next to hi-conjugated pret.act.3pl. 
*-er / *-rs (with *-rs being the original zero-grade variant of *-er?), possessed the mi- 
conjugated ending *-(V)nto, and that in all branches only one of these endings survived. 
Hittite generalized the ending *-er, Lydian the ending *-rs, and Palaic and Luwic the ending 
*-(V)nto. This spread of the mi-conjugated ending *-(V)nto at the cost of hi-conjugated *-er / 
*-rs may then be seen as a shared innovation of Palaic and Luwic. 

37 Yakubovich (2010: 6) cites this isogloss as the defining feature of the Luwo-Palaic subgroup. 
? Thus also Oettinger 1978: 92; Yakubovich 2010: 6. See now also Giusfredi 2020: 18-19. 


36 
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5.3.3 The Position of Lydian 


The exact position of Lydian is widely debated, which is due to the fact that this 
language is poorly understood: only a few Lydian words can be securely 
translated, making it difficult to establish etymologies and thus sound corres- 
pondences with the other Anatolian languages. Nevertheless, there seems to be 
more and more consensus that Lydian, too, was related to the Luwic sub- 
branch, since the two share some isoglosses: 

e “j-mutation” (cf. Sasseville 2017): since the “7-mutation” inflection reflects 
the normal PIE proterodynamic i-stem inflection (Norbruis 2021: 9-50; see 
also footnote 27), its presence in Lydian is not remarkable per se. However, 
its presence in nouns like sfardét(i)- ‘Sardian’ (nom.sg.c. sfardetis vs. dat.pl. 
sfardétav), which originally was probably an *-nt-stem, implies the spread of 
the “i-mutation” inflection at the cost of the consonant-stem inflection, which 
would be an innovation shared with Luwic (and Palaic). 

e Lydian pres.act.1sg. -u/-w is identical to PLuwic *-ŭ (CLuw. -ui, HLuw. -wi, 
Lyc. -u), ° which contrasts with Hitt. pres.act.1sg. -mi and -hhi (unfortu- 
nately, in Palaic no pres.act.1sg. forms are attested). However, if *-i/ indeed 
goes back to the PIA thematic pres. 1sg. ending *-oH (Kloekhorst 2013: 146), 
the ending is not the result of an innovation. Nevertheless, the fact that both in 
Luwic and in Lydian (as far as we can tell) *-ii < *-oH ousted the athematic 
mi-conjugation ending *-mi and the hi-conjugation ending *-h e-i (which 
were retained in Hittite, where a newly created *-o-mi ousted original *-oH) 
can be seen as a common innovation.*° 

* Lydian -cuwe- ‘to erect(?)’ is regarded by Oettinger (1978: 89) and Melchert 
(2003: 269) as cognate to CLuw. tüua- ‘to place’, HLuw. tuwa-' ‘to place’ and 
Lyc. tuwe- ‘to place (upright)’, which all reflect a stem *tuuV-. Although 
there are different views on the exact origin of this formation, it is mostly 
seen as an innovation, which then must have been shared by Lydian and the 
Luwic languages.” 


3° As Stefan Norbruis and Oscar Billing (pers. comm.) have pointed out to me, since Lycian does 


not show a general loss of word-final *-i, Lyc. -u is better derived from PLuw. *-i than from *-ii 
or *-ui. This means that we have to assume that in the Luwian languages the original ending *-ü 
was secondarily extended with the present marker *-7. 

Yakubovich (2010: 6) cites this isogloss as the defining feature of the *Non-Hittite" subgroup. 
According to Oettinger (1978: 89), the stem tuuV- is based on a false segmentation of the 
pres.1pl. form *tuuan(i) < *(d'e-)d'h,-uéne(-i) of the verbal root *d"eh;- ‘to put’. Frotscher 
(2012) argues that *tuuV- derives from earlier *d"h;-oi-, the stem that is found in Hitt. dai- / ti- 
‘to put’, also derived from PIE *d’eh;-. And Melchert (2004: 74) rather derives *tuuV- from 
a stem *(s)teh;w-, ultimately belonging to PIE *stehz- ‘to stand’. Since in all languages *tuuV- 
means something like ‘to erect’, a connection with PIE *steh>- may indeed be more attractive 
than a connection with *d"e/;-. Nevertheless, in all analyses the stem *tuuV- is to be viewed as 
an innovation. 


40 
4 
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Note that we are not necessarily dealing with a shared innovation in all cases in 

which Lydian coincides with Luwic: 

e Lyd. taada- ‘father’ < *tóto- is cognate with PLuwic. *tóti- (CLuw. tàti-, 
HLuw. tati-, Lyc. tedi-, Car. ted), which differs from Hitt. atta- and Pal. papa- 
‘father’. However, since it cannot be excluded that *fdto- is the Proto- 
Anatolian form, whereas Hitt. atta- and Pal. papa- are innovations, this 
isogloss between Lydian and Luwic (see below for the difference in “i-muta- 
tion") could in principle represent a shared retention and is therefore non- 
probative. 

* Lydian has a 1sg. reflexive particle =m, which is identical to PLuwic *=mi 
(CLuw. =mi(?), HLuw. =mi), but contrasts with Hittite, which uses =z(a) < 
*—fi in this function (no attestations known for Palaic). Since it cannot be 
excluded that Lydian and Luwic reflect the Proto-Anatolian situation, 
whereas Hittite may have undergone an innovation, this isogloss may repre- 
sent a shared retention and therefore is non-probative. 

Moreover, there are also some Luwic isoglosses in which Lydian clearly does 

not participate: 

e PAnat. lenis /k*/> Lyd. k in käna- ‘woman’ < *g"oneh;- (whereas in PLuwic, 
PAnat. */kw/ is weakened to *u, e.g. *g"oneh;- > CLuw. uana-) 

* Lyd. ciw- ‘god’ < PAnat. */tieu-/ < PIE *dieu- (vs. PLuwic *mas:Vn- ‘god’) 

e Lyd. a-stem noun taada- ‘father’ (vs. PLuwic “i-mutated” *roti-, see the 
forms cited above) 

I am therefore reluctant to view Lydian as a proper Luwic language; rather, 

I assume that both Lydian and Proto-Luwic derive from an earlier node. In 

order to establish the position of this node vis-à-vis the Luwo-Palaic node as 

assumed above, the following arguments can be used: 

* The Lydian dat.sg. form of the 3rd person enclitic pronoun, =måÀ ‘to him/her’, 
can be derived from *=smei / *=smoi (Kloekhorst 2012: 169), which to my 
mind is an archaic morpheme (cognate with the PIE element *-sm- as found 
in, e.g., the Skt. pronominal stem tasm-). Lydian thus did not participate in 
the Luwo-Palaic innovation by which the original dat.sg. of the 2nd person 
enclitic pronoun, *— fu, was extended to the 3rd person. ^? 

e The Lydian pret.act.Isg. ending is -v, which reflects the PAnat. mi- 
conjugation ending *-m. Since in Lydian no trace of the corresponding 
PAnat. hi-conjugation ending *-q(:)a (< PIE *-h e) is found, we may assume 
that *-m > Lyd. -v had been generalized at the cost of *-q(:)a. This would then 
be a reverse development to the generalization of the hi-conjugation ending 


42 Tt cannot be excluded, however, that Lyd. =mA < *=sm Vi received its -m- from the correspond- 


ing dat.pl. form *=smos (Hitt. =smas, CLuw. =mmas, Lyd. =ms) and originally was *=soi, thus 
being directly cognate with Hitt. =sse. If this is the case, Lydian would still show an archaic 
morpheme vis-à-vis the innovated *=tu of Luwo-Palaic, which would indicate that Lydian 
should derive from a higher node. 
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*-q(:)a at the cost of *-m that took place in Palaic and Luwic, and which was 
mentioned above as a possible shared innovation between these latter two 
branches. 
It is for these reasons that I assume that the *Luwo-Lydian" node must be 
placed higher up the family tree than Luwo-Palaic (thus also Oettinger 1978: 
92; Yakubovich 2010: 6). 


5.3.4 The Position of Hittite 


Hittite proper (“HattuSa Hittite”) knew a sister dialect, KaniSite Hittite, that is 
very similar to it but in some points does deviate (Kloekhorst 2019). This calls 
for the postulation of a Proto-Hittite ancestor language that may have been 
spoken only a few generations before the oldest attestations of Kanisite Hittite 
(twentieth century BCE), i.e. around 2100 BCE. 

As has become clear in the sections above, there are no clear linguistic 
innovations that Hittite shares with any of the other Anatolian 
languages." As Rieken puts it, Hittite is “notorious for [its] conservatism” 
(2017: 303). However, this does not mean that Hittite can be directly 
equated with Proto-Anatolian: Hittite, too, has undergone its specific 
innovations (e.g. the assibilation of dental stop + *i; the almost complete 
elimination of paradigmatic alternations between fortis and lenis stops; the 
reshaping of some verbal endings; the transfer of many mi-verbs to the 
hi-conjugation (Norbruis 2021: 131—207); the spread of the n-suffix in 
the word for ‘earth’; etc.). 


5.3.5 Dating Proto-Anatolian 


Although it is difficult to say anything certain about the absolute dating of 
reconstructed ancestor languages, in the case of Proto-Anatolian we have seen 
that its two best-known branches, Luwic and Hittite, have proto-languages that 
are roughly contemporaneous: Proto-Luwic can be approximately dated to the 
twenty-first-twentieth century BCE, and Proto-Hittite to c. 2100 BCE. The 
difference between the two is quite sizable, and elsewhere (Kloekhorst in press) 
I have therefore argued that they may have been a millennium apart from each 
other, which would mean that Proto-Anatolian started to diverge sometime 
around the thirty-first century BCE. 


* At first sight, the fact that both Lyd. ciw- and Hitt. šiu- < PAnat. */tieu-/ ‘god’ show assibilation / 
palatalization of the word-initial */t/ may be seen as a shared innovation between these two 
languages. However, since Lydian shows other features that it shares with Luwo-Palaic, we 
have to assume that the assibilation in the word for ‘god’ is a parallel, not shared, innovation in 
these languages. 
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Figure 5.2 Tree model of the Anatolian languages 


5.3.6 The Dialectal Make-Up of Anatolian 


Taking all these points into account, we arrive at a tree model of the Anatolian 
branch as shown in Figure 5.2.** 


5.4 The Relationship of Anatolian to the Other Branches 


In 1994, Puhvel argued that Anatolian shares many linguistic features with 
Celtic, Germanic, Italic, Tocharian and, to a lesser extent, Greek, which would 
point to a genetic relationship between Anatolian and these “western” 
branches. For instance, Hitt. ispant-' ‘to libate’ matches Lat. sponded and Gr. 
onevöo but has no cognates anywhere else. However, as Melchert (2016: 300) 
rightly states, all such cases “can be interpreted as common retentions that just 
happen to be preserved in Anatolian and the western dialects" and therefore 
“simply are not probative" for determining the position of Anatolian in the 
Indo-European family tree: only secured common innovations can be used to 
this end. Melchert himself thinks that such common innovations between 
Anatolian and *western" languages may indeed exist, but, to his mind, they 
would rather prove “post-divergence contact between Anatolian and the west- 
ern dialects" (2016: 300), and thus have no bearing on the genealogical 
position of Anatolian. Although space limitations do not allow me to examine 


44 This tree largely coincides with the trees given by Oettinger (1978: 92) and Yakubovich 
(2010: 6). 
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the four examples treated by Melchert (2016), it is quite clear that none ofthem 
can withstand scrutiny. There is thus no reason to assume that Anatolian shares 
any innovations, either contact-induced or caused by a genetic relationship, 
with “western” Indo-European languages or, for that matter, with any of the 
other Indo-European languages. 


5.5 The Position of Anatolian 


A hotly debated issue with regard to Anatolian is the so-called Indo-Anatolian 
hypothesis (also “Indo-Hittite hypothesis"), which states that Anatolian was 
the first branch to split off from the Indo-European mother language (which 
then may be called “Proto-Indo-Anatolian” or “Proto-Indo-Hittite”), after 
which the remaining language, which was to become the mother language 
of all the non-Anatolian Indo-European languages (and which may be called 
"Core Proto-Indo-European", “Nuclear Proto-Indo-European”, “Classical 
Proto-Indo-European” or similar) underwent a set of innovations. There is 
some debate on whether this hypothesis is valid at all, and if so, how large the 
gap is between the moment Anatolian split off and the time that the first split 
within Core Proto-Indo-European (CPIE) took place (which is usually 
thought to have been the split-off of Tocharian, see Chapter 6). Some scholars 
do not think that there is enough evidence for assuming an early split-off of 
Anatolian at all (Rieken 2009; Adiego 2016); others think that there may have 
been an early split, but that the gap between Anatolian and the next split is 
relatively modest (Eichner 2015; Melchert in press), whereas still others think 
that the gap is sufficiently large for the Proto-Indo-Anatolian ancestor lan- 
guage to be substantially different from Core Proto-Indo-European 
(Kloekhorst 2008: 7-11; Oettinger 2014; Kloekhorst & Pronk 2019). 

The validity of the Indo-Anatolian hypothesis can only be proven if enough 
secured shared innovations of the non-Anatolian languages can be found. In 
Kloekhorst & Pronk 2019: 3-6, a total of thirty-four linguistic features are 
listed in which Anatolian deviates from the other Indo-European languages, 
and which are presented as possible cases in which Anatolian has retained the 
original state of affairs, whereas the other Indo-European languages have 
undergone a common innovation (with twenty-three examples classified as 
“good candidates", and eleven as “less forceful” but “promising” ones). These 
include: 


Semantic Innovations 

* The Hittite participle suffix -ant- forms both active and passive participles, 
whereas in CPIE the suffix *-e/ont- only forms active participles: narrowing 
of the function of *-e/ont- in CPIE (Oettinger 2014: 156-7). 
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The Hitt. active verb &5-” < *h,es-" means ‘to sit’ next to its middle 
counterpart es-“”® < *h,e-h)s-°, which means ‘to sit down’, whereas in 
CPIE the middle verb */A;e-h;s-^ means ‘to sit’ next to the verbal root 
*sed- ‘to sit down’: expansion of the meaning of */A;e-h;s- from ‘to sit 
down’ to ‘to sit’, with replacement of */A;e-h;s- ‘to sit down’ by *sed- 
(Norbruis 2021: 235-41). 

Hitt. harra-' < *h erh3- means ‘to grind, crush’, whereas CPIE *h erh3- 
means ‘to plough’: semantic specialisation in CPIE (Kloekhorst 2008: 9). 
Hitt. mer- < *mer- means ‘to disappear’, whereas CPIE *mer- means ‘to die’: 
semantic shift, through euphemism, in CPIE (Kloekhorst 2008: 8). 


Morphological Innovations 

* Anatolian has two genders (common/neuter), whereas CPIE has three genders 
(m./f./n.): creation of the feminine gender in CPIE (e.g. Melchert in press). 

e Anatolian has nom. *ti(H), obl. *tu- * you (sg.)’, whereas CPIE has nom. 
*tuH, obl. *tu-: spread of obl. stem *zu- to the nominative in CPIE 
(Kloekhorst 2008: 8-9). 

e Anat. *A;eku- vs. CPIE *h ,eku-o- ‘horse’: thematization in CPIE 
(Kloekhorst 2008: 10). 

* Hitt. huuant- < *h;uh;-ent- vs. CPIE *Ah;ueh;nt-o- ‘wind’: thematization in 
CPIE (Eichner 2015: 17-18). 


Sound Changes 
* Anat. */5 = *[q:] and *h3 = *[q:"] vs. CPIE *h2 = *[h] or *[£] and *h3 = *[h"] 
or *[€"]: fricativization of uvular stops in CPIE (Kloekhorst 20182). 
e Hitt. amm- < *h ‚mm- (< pre-PIE *h ‚mn-) vs. CPIE *h;m- ‘me’: degemina- 
tion of *mm to *m in CPIE (Kloekhorst 2008: 111 n. 234). 

Although it is certainly possible that not all of the arguments listed in 
Kloekhorst & Pronk 2019 will eventually become generally accepted, it 
seems very unlikely that they will all be refuted, and the Indo-Anatolian 
hypothesis can thus be regarded as virtually proven. Moreover, since the 
number of arguments listed is relatively large and some of them concern 
significant structural innovations (especially the rise of the feminine gender 
in CPIE, including the creation of the accompanying morphology), it has been 
argued that the temporal gap between the Anatolian split and the subsequent 
Tocharian split (cf. Chapter 6) may have been in the range of 800-1000 years. 
With the Tocharian split commencing around 3400—3300 BCE, the Anatolian 
split may be dated to the period between 4400—4100 BCE. If Proto-Anatolian 
indeed first broke up into its daughter languages around the thirty-first century 
BCE (see Section 5.3.5), it would mean that it had some 1,300—1000 years to 
undergo the specific innovations that define Anatolian as a separate branch (see 
Section 5.2). Since these innovations include some large restructurings of 
especially the verbal system (loss of the subjunctive and optative mood, merger 
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of the present and aorist aspects, creation of the hi-conjugation on the basis of 
the PIE perfect), such a time span would certainly be fitting. 
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6 Tocharian 


Michaél Peyrot 


6.1 Introduction 


The Tocharian languages A and B are attested in manuscripts from the northern 
Tarim Basin, present-day Northwest China. Tocharian B is attested from about 
the fifth to the tenth centuries of the Common Era. Originally from Kuéa, it 
spread east to Yanqi and Turfan, probably in the late sixth and in the seventh 
century. In Tocharian B itself, the language is referred to as the language of kusi 
‘Kuča’. Tocharian A is attested a little later, from about the seventh to the tenth 
centuries. It is originally from Yanqi, spread with Tocharian B east to Turfan, 
but not west to Kuéa, and is referred to as the language of arsi ‘Yanqi’. Both 
languages are written in the Indian Brahmi script, and the vast majority of the 
manuscripts are of Buddhist content. 

Traces of a third Tocharian language have been claimed to be preserved in 
the Middle Indic Gandhari dialect of Niya in the southern Tarim Basin (Burrow 
1935). This hypothesis has not received wide support and must still be con- 
sidered very uncertain (see further below in Section 6.3).! 


6.2 Evidence for the Tocharian Branch 


The existence of the Tocharian branch of Indo-European is beyond any doubt. The 

two languages A and B are closely related and share numerous significant innov- 

ations, so it is unnecessary to give a full list here. Some of the more important, 

branch-defining developments are: 

* loss of the threefold Proto-Indo-European distinction between the conven- 
tionally termed voiceless, voiced and voiced aspirated stops, i.e. *k, *g, *g" 
merged into *k (on *d, see below); 


! I will not discuss in detail a posthumously published proposal by Schmidt (2018: 161-271) to 
read previously undeciphered manuscript fragments in Formal Kharosthi as a Tocharian variety 
from Lóulán. His tentative decipherment is not convincing. Instead, these fragments are probably 
written in an Iranian language related to Khotanese and TumSuqese (Dragoni, Schoubben & 
Peyrot 2020). 
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* several mergers and shifts in the vowel system, including loss of vowel 
length, merger of *i, *e, *u into *a (the first two regressively palatalising), 
shifts of *o to *e and of *à < *eh2 to *o, monophthongisation of *ei to * '; and 
of *eu to * u, etc.; 

* rise of distinctive and morphological palatalisation, principally through the 
transformation of the contrast between *o : *e into *e : *'e and *Ø : *e into 
tirta; 

* loss of word-final *-s, *-m, *-n, *-t (*-d), which has led to heavy restructur- 
ing of both the nominal and the verbal inflection; 

* rise of agglutinative case inflection in the noun, next to agglutinative number 
inflection in some noun classes; 

* almost complete loss of prefixing morphology; 

* rise of an intricate system of verbal derivation to form intransitives and 
transitives or causatives; 

* numerous significant innovations in the lexicon. 

Even considering the late attestation of the Tocharian branch, the extent of 

structural change is surprisingly large, and it can be argued that this is partly 

due to a substrate effect. The loss of the distinction between the so-called 
voiceless, voiced and voiced aspirated stops, the rise of agglutinative case 
inflection, and the functions of these case suffixes, which include the perlative, 
denoting movement through, along or over something, point to Uralic influ- 
ence. A pre-Proto-Tocharian phase of the vowel system can be compared more 
specifically with an early form of Samoyedic. Pronoun suffixes attached to the 
finite verb denoting the object may be compared with the objective inflection in 

Uralic (Peyrot 2019a with references; on the vowel system, see Warries in 

press). 

It is more difficult to assess the Iranian impact on Tocharian. There has been 
considerable Iranian influence on the lexicon (Isebaert 1980; Tremblay 2005), 
but only the oldest layer of borrowings from Old Iranian may possibly be added 
as a branch-defining feature of Tocharian. The reason is that any feature 
defining the whole branch should have been acquired before the break-up of 
unitary Tocharian into Tocharian A and B. This is clearly the case with the 
structural shift attributed to Uralic above. However, many borrowings from 
Iranian are to be dated after the break-up and therefore do not define the 
Tocharian branch as such. Examples of this include borrowings from 
Bactrian, such as Toch.B akalk and Toch.A akal ‘wish’ from Bactrian 
ayadyo /ayalg/: the à à vocalism of Tocharian A, instead of the à a vocalism 
regular in inherited vocabulary, shows that the word has entered the language 
later, and the Toch.B and Toch.A forms cannot be reconstructed to a common 
proto-form. Bactrian influence is therefore to be dated after the split of Proto- 
Tocharian. The case of borrowings from Old Iranian is different. An example is 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


6 Tocharian 85 


Toch.B perne, Toch.A paräm ‘glory’, which allows a Proto-Tocharian recon- 
struction *perne, borrowed from Old Iranian *farnah- (Av. x’aranah-). 
Nevertheless, for the Old Iranian layer, the details are not fully clear either. 
Tocharian B would have preserved a word like *perne unchanged, and the 
amount of change in Tocharian A is limited: *e- a in the first syllable; apocope 
of *e in the final syllable; d-epenthesis in the final cluster -rn. Since these 
changes in Tocharian A cannot be dated exactly, it cannot be excluded that 
*farnah- was borrowed into Tocharian B and A independently, at an early 
stage, before the relevant sound changes in Tocharian A occurred but after the 
break-up of Proto-Tocharian. A reason to consider this more complicated 
chronology are the sound changes *rn > rr and */n > Il in both languages. 
Good examples of the former are not found in Tocharian A, but the latter is 
certain. Since old geminates are generally simplified in Tocharian A, the rise of 
new geminates from *rn and */n must be dated after the general simplification 
of geminates. The preservation of rn in ‘glory’ thus suggests an early but post- 
Proto-Tocharian borrowing according to the following relative chronology: 
1. break-up of Proto-Tocharian; 
2. degemination in pre-Tocharian A; 
3. assimilation of *rn, */n to rr, ll (the same change occurred independently in 
pre-Tocharian B); 
4. borrowing of *farnah- as *perne (the same borrowing occurred independ- 
ently in pre-Tocharian B); 
5. *e > a, apocope of final *e, and d-epenthesis to produce Tocharian A paräm. 
Another indication of this chronology is offered by Toch.B etswe ‘mule’, 
borrowed from Old Iranian *atswa- 'horse' (Av. aspa-). Although Toch. 
B matstsa-, Toch.A ndtswa- ‘starve’ shows that Proto-Tocharian *tsw has 
developed to tsts in pre-Tocharian B after the break-up of Proto-Tocharian, 
etswe has tsw unchanged, suggesting that the borrowing is post-Proto-Tocharian. 
Old Iranian borrowings can only be taken as a branch-defining feature if the 
preservation of the cluster tsw in Tocharian B, and of the cluster rn in both 
languages, receives an alternative explanation, notably a conditioning of the 
relevant assimilations, such as a difference in accent. 


6.3 The Internal Structure of Tocharian 


As Tocharian A cannot be derived from Tocharian B or vice versa, a common 
ancestor called Proto-Tocharian needs to be reconstructed. For instance, Toch. 
B yente ‘wind’ cannot have yielded Toch.A want ‘wind’, and the reverse is also 
impossible: a preform *wente is to be posited, with innovations in both 
languages leading to the attested forms. There is no need to discuss the internal 
subgrouping of Tocharian, since only one tree is possible. The dating of Proto- 
Tocharian, the only node in this tree, will be discussed below. Even though 
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Burrow's hypothesis of a third Tocharian language is too uncertain to be taken 
into account for inferences on the prehistory of Tocharian, it presents an 
illustrative case for the methodology of internal subgrouping. 

The Gändhärt words in the documents from Niya for which Burrow (1935) 
suggests a Tocharian etymology are few, and among these only two are relevant 
here: Kitsa 'itsa, a title, and amklatsa, a type of camel. kitsa’itsa has a very 
Tocharian-looking structure and has been convincingly connected to Toch.B 
ktsaitse ‘old’, Toch.A ktsets ‘perfect’ by Burrow, who suggests ‘elder’ for the 
Gandhari title. Toch.B ktsaitse derives from PToch. *kotait‘t‘e with degemina- 
tion after a diphthong,” and Toch.A ktsets has undergone apocope of final -e and 
monophthongisation of *ai to e; both languages have syncopated the *a in the 
first syllable. Niya kitsa’itsa could derive from Tocharian B as well as pre- 
Tocharian A or a third branch and is therefore useless for subgrouping. It could 
reflect an older form of the type *kat:ait‘t‘e with i for a and the regular Gandhari 
final -a for the regular Tocharian final -e. The geminate could be simplified or 
left unwritten. Equally, it could go back to a form of the type Toch.B ktsaitse, 
with ; -epenthesis in the first syllable. Since Tocharian A is attested from the 
seventh century onwards, much later than Niya Gandhari, which is from the 
third-fourth centuries, it could also derive from an early form of Tocharian A in 
which monophthongisation of *ai to e had not yet taken place. 

The key form for Burrow's understanding of the internal subgrouping is 
amklatsa (1935: 673). According to him, amklatsa denotes a relatively cheap 
camel, which may therefore have been untrained. He connects the word to 
Toch.B aknatsa, Toch.A aknats ‘fool’, which is formed with the negative prefix 
*en- from the verb *kna- ‘to know’: in both languages, the vowel of the prefix 
has been affected by a-umlaut, and its nasal has been lost before the cluster kn-. 
To explain the different cluster mkl in Niya Gandhari, he assumes that it goes 
back to an earlier form with *nkn that was dissimilated to nkl, written mkl. 
Since the first n of the cluster is lost in both Tocharian A and B, he concludes 
that the Tocharian variety he assumes in the Gandhari of Niya is of a different 
branch, and this is the reason why it is often termed “Tocharian C". 

Burrow's Tocharian etymology of Niya Gandhari kitsa’itsa is attractive, but 
his explanation of amklatsa is not convincing in view ofthe semantic and formal 
problems. At any rate, this questionable etymology can never alone bear the 
weight of proving a third branch of Tocharian, the famous “Tocharian C". 


? See Peyrot 2008: 45 on Toch.B -auñe < *euññe with the same degemination. Adams cites the 
word as ktsaitstse (2013: 263), but this form is not attested. 

? Even in the unlikely event that the etymology should be correct nevertheless, it does not 
necessarily prove the existence of a third branch of Tocharian. Rather than being a shared 
innovation of Tocharian A and B, the change *nkn > kn may be a parallel development, since 
there are cases where the nasal is lost in Tocharian A but preserved in Tocharian B (Hilmarsson 
1991: 193-8). 
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Rather, in the light of research by Niels Schoubben, who proposes new and 
convincing alternative explanations for some other items that Burrow explained 
as Tocharian (Schoubben 2021), scepticism about Burrow's hypothesis is defin- 
itely due. 

No absolute date can be given for Proto-Tocharian, by definition the latest 
phase of unity before the break-up in pre-Tocharian A and pre-Tocharian 
B. The languages are closely related, but differences are considerable in the 
lexicon, and most scholars estimate Proto-Tocharian around 500 BCE: some 
take it to be a little bit earlier, between 1000 BCE and 500 BCE; others a little 
bit later, between 500 BCE and the beginning of the Common Era (see the 
useful overview of different estimates in Mallory 2015: 7-8). 

Itis commonly agreed that the advent of Buddhism was after the break-up, as 
such basic terms as dharma ‘law’ (Toch.B pelaikne, Toch.A märkampal) and 
karman ‘act, fate’ (Toch.B yamor, Toch.A /yalypu) are different (Lane 1966). 
But since Buddhism arrived late in the region, perhaps in the first or second 
century CE, this gives only an unsurprising ante quem date. 

Contacts with the Iranian languages Bactrian and Sogdian took place after 
the split, probably in the early first millennium CE. Contacts with Old Iranian 
are more interesting: since it can be debated whether they occurred before or 
after the break-up, they may have to be dated close to that break-up. In the 
scenario sketched above, they would have occurred soon after it^ 
However, the Old Iranian loanwords are themselves difficult to date in 
absolute terms. The archaic appearance of words such as Toch.B etswe 
‘mule’ = Olrn. *atswa- ‘horse’ (Av. aspa-) or Toch.B waipecce ‘posses- 
sions’ = Olrn. *hwai-padya- (Av. x’aepaidiia- ‘own’) suggests a date in the 
middle of the first millennium BCE or earlier, but a more precise dating is 
difficult. I have suggested that these loanwords may be associated with the 
presence of Andronovo related groups in Northern Xinjiang in the thirteenth— 
ninth centuries BCE (Peyrot 2018: 280), which would accordingly push the 
date of Proto-Tocharian towards the beginning of the first millennium BCE. 
The assumed contacts with Uralic, which may date to around 2500 BCE, in 
any case took place long before the split, in a pre-Proto-Tocharian phase. 

Archaeological evidence on the Tocharians themselves is at present not clear 
enough (Mallory 2015: 29 and passim). It is uncertain whether the 
Cháwühügou cultural group near Qarasähär (Debaine-Francfort 1989: 183— 
9), whose different phases together cover almost the entire first millennium 
BCE, can be identified with early speakers of Tocharian A, or whether the 
Haladün cultural group of the early first millennium BCE in and near Kuča 


^ It is tempting to consider the possibility that the apparently impressive technological advances 
brought by the Iranians speaking this Old Iranian language were the impetus for the split of 
Proto-Tocharian. At present no evidence for or against this scenario seems to be available. 
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(Debaine-Francfort 1988: 23) can be identified with early speakers of 
Tocharian B. Accordingly, archaeological evidence for the date of Proto- 
Tocharian or the place where it was spoken is presently indirect at best. 


6.4 The Relationship of Tocharian to the Other Branches 


It is now commonly held that Tocharian has no closer affinity to any other 
branch of Indo-European.? Proposals for closer affinity have been made but 
have found little acceptance and concern superficial similarities, such as the 
spread of the n-stems in the nominal inflection, which would be shared with 
Germanic (Adams 1988: 5), or the endings in -r ofthe middle, suggesting a link 
with Italo-Celtic (e.g. Lane 1970: 78, who attributes the correspondence to 
post-Proto-Indo-European contact), and so on. References to and discussion of 
these and other suggestions can be found in Hackstein 2005 and Malzahn 
2016: 281. 

Not accepting any of the adduced old comparisons, Hackstein (2005) pro- 
poses instead several close matches between Tocharian and other branches in 
grammaticalisation processes. According to him, the observed grammaticalisa- 
tion processes are independent and parallel instead of shared, and indicate post- 
Proto-Indo-European contact. The matches that he proposes are with Latin, 
Slavic, Gothic, Greek and Armenian. Although the cases discussed are inter- 
esting, the large number of languages in the comparison makes it unlikely that 
the parallelisms are due to early contact. In addition, it is open to debate 
whether the parallelisms, if correctly identified, are indeed so salient that they 
cannot have come about completely independently. For instance, the univerba- 
tion of interrogative and demonstrative in Toch.B k,se ‘who’ <*k"i+ so, in Alb. 
kush ‘who’ < *k"is + so, and in OCS koto ‘who’ with -to from PIE *tod 
(Hackstein 2005: 177) has not proceeded in exactly the same way; it probably 
compensates, at least in part, for the loss of inflection and word weight; and it 
appears to be a natural process. Toch.B s and spä ‘and’, which Hackstein 
derives from *hjeti and *h;eti-h;epi respectively, in fact represent one and 
the same etymon *spa with simplification of sp to s in classical and late 
Tocharian B (Peyrot 2008: 68) so that s cannot be directly compared with 
Latin et or Gothic ip (pace Hackstein 2005: 176). 

A different case is presented by matches with Anatolian, of which several 
have been proposed that appear to be fairly solid: see for instance Pinault 
2006a: 93. These must be archaisms, not showing any closer affinity between 
Anatolian and Tocharian, and are potentially relevant to establish the position 
of Tocharian in the tree of Indo-European, discussed in the following section. 


5 The prolonged contact with Iranian and the shorter but dramatic impact of Indic are obviously to 
be discarded as secondary. 
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6.5 The Position of Tocharian 


Tocharian is often claimed to have been the second branch to split off the Indo- 
European proto-language: after Anatolian, but before all other attested 
branches. This hypothesis may be called the “Indo-Tocharian” hypothesis, 
based on the model of Indo-Anatolian (Peyrot 2019b; see Figure 6.1). “Indo- 
Anatolian”, equivalent to “Indo-Hittite”, is used here in a technical sense for 
the highest node in the Indo-European tree, before Anatolian split offas the first 
branch, a scenario for which the evidence is steadily growing (cf. Kloekhorst & 
Pronk 2019).° Strikingly, the arguments that have been advanced in support of 
the “Indo-Tocharian” hypothesis vary considerably: many authors making the 
same claim do not accept each other's evidence for their claim. The most 
comprehensive systematic review is that by Ringe (1991), who finds hardly 
any evidence for the position of Tocharian in the family tree at all. Other 
relevant contributions include Lane 1970, Schmidt 1992, Winter 1997, 
Pinault 2013 and Malzahn 2016. 

Below, a selection of arguments will be discussed. In general, it appears that 
aberrancies of Tocharian are due to innovation, and careful reconstruction tends 
to bring Tocharian closer to non-Anatolian Indo-European. The Indo-Tocharian 


Balto-Slavic 
Indo-Iranian 
Albanian 
Armenian 


Greek 


Germanic 


Celtic 
Italo-Celtic L Italic Tocharian B 
Indo-Tocharian ———————————— Tocharian FF eee A 
Indo-European ——— — — — — — —— Anatolian 


Figure 6.1 The position of Tocharian 


6 Jasanoff (2017: 233-4) explicitly subscribes to this scenario but rejects the term “Indo-Hittite” 
because it “acquired tendentious overtones” (p. 233). 
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hypothesis still seems attractive, but evidence is slim and the difference between 
Indo-Anatolian and Indo-Tocharian appears to be much larger than that between 
Indo-Tocharian and the other Indo-European languages. If Indo-Anatolian can be 
dated to the middle of the fifth millennium BCE, Indo-Tocharian must be much 
closer to the middle of the fourth millennium. As pointed out to me by Tijmen 
Pronk, the split-off of the Tocharian branch (Anthony 2007: 305, 307-11; 
Anthony & Ringe 2015: 208, 211) may be associated with the apparent aban- 
donment of the Caspian steppe in 3500—3400 BCE, probably due to abrupt 
aridification (Shishlina 2008: 220). 


6.5.1 | Methodology 


In view of the many different arguments that have been proposed for the Indo- 
Tocharian hypothesis, a brief note on the methodology seems in order. 

It is generally agreed that the assumption of an early Tocharian split-off must 
be based on shared innovations of the other non-Anatolian Indo-European 
languages. In particular, the branch that split off after Tocharian should have 
shared in such innovations. As the most likely candidate for the branch to have 
split off third appears to be Italo-Celtic, the supposed shared innovation should 
ideally be attested in this branch. Conversely, arguments for Indo-Anatolian 
should be based on shared innovations of non-Anatolian Indo-European, 
ideally attested also in Tocharian (Peyrot 2019b). 

Though clear in theory, in practice finding and defining shared innovations is 

difficult. There appear to be the following requirements to shared innovations 
useful for phylogenetic subgrouping: 
identifiability: the linguistic element adduced as a shared innovation in the 
lower node should be clearly identifiable in the higher as well as in the lower 
node; 
unidirectionality: the observed difference with regard to the selected linguis- 
tic element should be interpretable as a unidirectional change; 
salience: the observed change should be so salient that it is unlikely to have 
occurred independently in the supposed lower-node branches, in which case 
it would be a parallel, not a shared innovation. 
The requirement of unidirectionality is widely accepted, and discussion tends 
to focus on the question as to whether a given difference can be interpreted as 
aunidirectional change, rather than the need of this requirement as such. A case 
in point is semantic change: phylogenetic arguments based on semantic change 
are often contested on the grounds that a given semantic difference is not 
necessarily due to unidirectional change. 

The requirement of identifiability, often implicit, may be helpful in discus- 
sions about debated phylogenetic arguments based on the loss or addition of 
features, or on lexical replacement. Arguments based on loss or addition are 
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notoriously difficult, as for instance with the comparative and superlative 
suffixes, which are unattested in Anatolian and Tocharian: have they been 
lost in both branches, or were they added after the Tocharian split-off? Such 
arguments cannot be applied 1f the supposedly added feature cannot be identi- 
fied with any prestage leading to it or if the lost feature has left no trace at all. 
Arguments based on lexical replacement are weak because the identifiable 
element would be the meaning, expressed with different etyma in two branches. 
Meaning is difficult to use as an identifiable element, because several etyma 
may have similar, overlapping or even identical meanings, and it is therefore 
difficult to prove that a certain meaning came to be expressed with a different 
etymon. 

The requirement of salience seems so obvious that no further explanation is 
needed. 


6.5.2 | Phonology 


For our present purposes, phonological evidence appears to be of little rele- 
vance in view of the extensive changes in the Tocharian sound system, which 
are probably due to a Uralic substrate (see Section 6.2). In particular, evidence 
for the phonetic realisation of the stops in the proto-language has been obscured 
by this substrate effect. Thus, there is little evidence to establish the position of 
Tocharian with relation to Kloekhorst's claim that Anatolian preserves an older 
system of stop distinctions (2016), with classical PIE *¢, *d, *d" from Proto- 
Indo-Anatolian *r:, *’t, *r. 

For Tocharian, the developments of *d and *b" are notable. PIE *d is by 
default represented with £s and is otherwise often lost, at least before *i, *u and 
*r, and so differs from *7 and *d^, whose default outcome is Tocharian t.” Thus, 
even though the exact phonetics remain difficult to establish, *¢ and *d^ were 
apparently closer to each other than either of them were to *d." At the same 
time, *b*is lost after *m, for instance in *gomb"o- > Toch.B keme ‘tooth’, while 
*p stays, for instance in *temp- (Lith. tempiu ‘stretch’) > Toch.B camp- ‘be 
able’.” This suggests that *b" was weaker than *p: it may have been voiced, 


7 Also, the palatalised reflex of *d is s, while that of *d" and *t is c. 

* Possibly, this distribution also holds for the assibilated variant -s < *-ti, *-d'i (Jasanoff 1987), 
although good evidence for the development of *-di is thus far lacking. 

? With original *mt, compare also *(d)kmtöm ‘100° > Toch.B kante. Parallel cases with *d", *&^, *g! 
and *g”" are not readily available. Ringe discusses the possibility that the Toch.B subj. stem of lət- 
‘go out’ as in the inf. /antsi shows /lon-/ < *hjlucnd'- (1996: 43). However, he notes that forms 
with a geminate nn like 1sg. lannu ‘I will go out’ rather suggest an original */antn-. Indeed, all 
forms with a nasal in Toch.B can probably be derived from /ann- < *lantn-, which arose 
secondarily through suffixation with *-nask- in the present (Peyrot 2013: 446). Toch.B /ank,tse 
‘light’ < *A,leng"'-u- shows that *g"^ was not lost after *n. It may be supposed that *g" and *e^ 
were not lost after *n either. The reason for this exception could be that there was no corresponding 
velar nasal phoneme and the velar stop had to remain in order to keep the velar nasal allophone. 
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fricative or both. It is tempting to compare the typologically common loss of 
voiced stops after nasals, as in English /amb /lem/, and posit the value [b] for 
*b", but this is certainly not the only option. Combining the evidence from 
dentals and labials, it appears that the stop system inherited by Tocharian had 
strong stops for the conventional voiceless stops like *t, weak stops for the 
conventional voiced aspirated stops like *d^, and a series that was different 
from both for the conventional voiced stops like *d.'° Although Tocharian 
offers no direct evidence for the reconstruction of glottalic stops in Proto-Indo- 
European, the fact that *d has a different reflex from *t and *d^ is neatly 
compatible with it, since under Kortlandt's glottalic theory (e.g. 1985; 20182) 
*d ['d] on the one hand is set apart from ** and *d" on the other. 

Nevertheless, the value for the phylogenetic position of Tocharian remains 
undecided. Since there is strong evidence for *d = *’d in classical Indo- 
European, this feature cannot be used. Further, the position of Tocharian cannot 
be determined with regard to Kloekhorst's claim that classical PIE *¢ (perhaps 
phonetically [t]) < Proto-Indo-Anatolian *r: and classical PIE *d^ (perhaps 
phonetically [d]) < Proto-Indo-Anatolian *f, since both phonetic stages are 
compatible with */ being stronger and *d" being weaker. 

It has been argued that Tocharian shows consonantal reflexes of PIE *H as 
k (e.g. Winter 1965: 206-10; Schmidt 1988; Kortlandt 2018b). Winter adduces 
Tocharian A “intrusive K" as a consonantal reflex of *HH, e.g. gen.pl. /wäkis to 
nom.-obl.pl. /wa ‘animals’ or perl.pl. puklaka to nom.-obl.pl. pukla ‘years’. 
However, k must be secondary in such examples because it effectively prevents 
the problematic vowel contractions in the morphologically expected forms 
**lwes < *lwa.is (next to attested gen.sg. /wes!) and **pukla < *pukla.a. 
Schmidt (cf. also Hartmann 2001) has argued that the k in roots ending in -tk 
goes back to *h;, but Melchert's earlier derivation of -tk- from -T-sk- is 
definitely to be preferred (1977; cf. also Pinault 2006b). Kortlandt's derivation 
of Tocharian B taka- ‘be’ from *steh»-t with -k- from *h; is in itself attractive, 
but since the “k-aorist” is also attested in e.g. Gr. Zürpca and Lat. feci, this reflex 
cannot be used to determine the phylogenetic position of Tocharian, even if the 


10 A thorough discussion of these developments can be found in Ringe 1996: 39-66 and 
Winter 1962. In both accounts, a complicating factor is the Tocharian version of 
Grassmann’s Law, exemplified by e.g. Toch.B tsik- ‘form’ < *d'eig/^- and tsak- ‘burn’ 
< *d'eg".. allegedly with ts < *d after *d^" had been deaspirated to *d before the 
following *g" and *g"^ respectively. The evidence for Grassmann’s Law in Tocharian 
is circumstantial and probably open to an alternative explanation. It is not taken into 
account here in view of the solid counterexample of Toch.B tapre ‘high’ < *d’ub"ro-, to 
be reconstructed with *b* instead of *b after Kroonen (2011: 253, 255). 

It is possible that Tocharian inherited a stop system in which distinctive voice had not yet 
developed, as argued by Kortlandt (e.g. 1985: 197; 2020: 269), but in my view this is difficult to 
prove. 
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evidence as such nicely fits Kloekhorst's reconstruction of *h> and *h3 as 
uvular stops for Proto-Indo-Anatolian (2018; cf. also Kortlandt 2002: 218). 

Like other Indo-European languages, Tocharian shows reflexes of metath- 
esis of *Hi to *iH and *Hu to *uH. For instance, metathesis of *Hu to *uH is 
attested by such forms as Toch.B puwar ‘fire’ < *puh>r'* (as in Greek zip) 
from earlier *peh;-ur (as in Hitt. pahhur) and Toch.B /2w(a)- ‘rub’ (prt.3sg.- 
3sg.obj. /yawa-ne ‘he rubbed him’) < */euh3- from earlier */eh3u- (as in Hitt. 
lähu-' ‘pour’). Even though unmetathesised forms are also found, for instance 
Toch.B kaw- ‘kill’ < *keh u-, the existence of metathesised forms in Tocharian 
clearly shows that this sound change is to be dated before Tocharian split off. 
However, even though Hittite often shows unmetathesised forms next to 
metathesised forms elsewhere (Kloekhorst & Pronk 2019: 5), the metathesis 
must have already occurred before Proto-Indo-Anatolian on the evidence of 
forms such as Hitt. šuhha- ‘pour, sprinkle’ < *su/i?- next to ishu(wa)- < *seh;u- 
and /u-u- ‘pour’ < */uh3- next to lahu- < *lehsu- (Melchert 2011: 129, 131). At 
this point, therefore, the mere attestation of laryngeal metathesis cannot be used 
for inner Indo-European phylogeny. 

However, another Indo-European metathesis may be used: that of word-final 
*-ur to *-ru (Lubotsky 1994: 99-100). This sound change seems to have 
occurred only after Proto-Indo-Anatolian. Strong evidence for it in Tocharian 
has been discovered by Del Tomba (2021), who shows that Toch.B plurals 
in -wa to nouns in -r, such as tarkdr ‘cloud’, pl. tärkarwa, presuppose 
metathesis of *-ur to *-ru in the singular, on which the plural -r-wa < *-ru-h; 
was built. Although this sound change may be used for the phylogeny of 
Indo-European, it clearly groups Tocharian together with the non-Anatolian 
languages. 


6.5.3 | Morphology 


Morphology is the domain that is often ascribed the highest potential to yield 
evidence for the phylogenetic position of Tocharian. Indeed, morphology 
meets two essential needs: it is constantly in the process of change, and, at 
the same time, shifts in function, though commonplace, are subject to 


12 The Tocharian word for ‘fire’ is variously reconstructed. Hackstein, for instance, reconstructs 
*phzuör (2017: 1314). It is, however, questionable whether *h> would be lost in this context, and 
whether the reconstruction ofa collective ending *-or for this etymon is warranted. A derivation 
of Toch.B puwar from *puh;r is the most straightforward. Winter (1965: 192) reconstructs the 
Tocharian A equivalent por as *paur from unmetathesised *peh>-ur. This is phonologically 
possible but most difficult morphologically, since it is not clear what the distribution of these 
variants in the Proto- Tocharian paradigm might have been. It is therefore preferable to assume 
a development *wa > o similar to *we > o in kom, obl.sg. of ku ‘dog’, < *kwen and *iye > ein 
karemam ‘laughing’ « *keriyemane (Hilmarsson 1989: 135; Hackstein 2017: 1314; Peyrot 
2012a: 210). 
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limitations. Unfortunately, Tocharian morphology is heavily reorganised and 

its prehistory is often very obscure. Even worse is the fact that the reconstruc- 

tion of Proto-Indo-European in exactly the relevant points is difficult and often 
disputed. 

Without a doubt, the most prominent argument for phylogeny based on 
morphology that has been advanced comes from the Tocharian s-preterite. In 
the active of the Tocharian s-preterite, an element s is only found in the 3sg.: 
lsg. prek-uwa ‘asked’, 2sg. prek-asta, 3sg. prek-sa, lpl. prek-am, 2pl. prek-as*, 
3pl. prek-ar. This is reminiscent of the Hittite hi-preterite, which likewise has -5 
only in the 3sg.: 3sg. akkis ‘died’, 3pl. aker (Pedersen 1941: 146). There are 
two schools of thought to explain this correspondence. The first, most promin- 
ently voiced by Jasanoff (e.g. 2003: 204—5),'? takes the restriction of the -s- as 
an archaism of Anatolian and Tocharian, while the rise of the classical s-aorist 
through generalisation of the -s- from the 3sg. throughout the whole paradigm 
is a common innovation of the other Indo-European branches. According to 
the second one, the -s in Hittite is secondary, probably somehow from the 
s-aorist, while in Tocharian the s-preterite forms without -s- lost it due to the 
effects of sound law and analogy (Ringe 1990; Kortlandt 1994; Peyrot 2013: 
503-7). The matter cannot be treated here in detail. Suffice to say that the 
assumption of loss of -s- accounts best for the inflection of the Tocharian 
preterite and its patternings with the subjunctive. At any rate, this famous 
case very clearly shows how different views on the reconstruction of Proto- 
Indo-European logically lead to different evaluations of arguments for 
phylogeny. 

Another phylogenetic argument is based on the middle endings in -r (e.g. 
Ringe, Warnow & Taylor 2002; Ringe 1991: 98—9). It is widely held that the 
shorter middle endings 3sg. *-to and 3pl. *-nto were secondary endings in 
Proto-Indo-European, while the corresponding primary endings were origin- 
ally 3sg. *-to-r, 3pl. *-nto-r, which were later replaced by 3sg. *-to-i, 3pl. 
*-nto-i, marked with the productive primary marker *-i as found in the active 
endings. This would not be a valid argument for Indo-Tocharian, since the 
r-endings are also found in Italo-Celtic and Phrygian, but it would group 
Tocharian with the older branches. 

However, a number of problems with this argument need to be noted: 

* [tis questionable as to whether the contrast between Toch.B pres. 3sg. -tär, 
3pl. -ntär and pret. 3sg. -te, 3pl. -nte continues an original primary-second- 
ary contrast, because the Tocharian preterite active endings do not continue 
the secondary endings of Proto-Indo-European. In the copula 3sg. ste, 3pl. 
skente, the endings -te, -nte are even used as present endings. Hackstein 
(1995: 273-5) explains these forms as original resultatives, i.e. “is” < “has 


13 Two more recent contributions are Melchert (2015) and Jasanoff (2019). 
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become", and notes that presentic readings of preterites are found elsewhere. 

However, it remains problematic as to why no shade of the past meaning has 

been preserved in 3sg. ste, 3pl. skente, and why the corresponding suffixed 

forms, such as 3sg.-1sg.obj. star-n, have present endings. This distribution is 
difficult to explain from an original difference in tense. 

* The reconstruction of the primary middle endings 3sg. *-to-r, 3pl. *-nto-r is 
problematic itself. Indeed, Lat. -tur, -ntur point to *-tor, *-ntor. However, as 
Weiss (2009: 413) notes, Osc. 3sg. -ter, 3pl. -nter point to *-tro, *-ntro, and 
Umb. primary -ter, -nter vs. secondary -tur, -ntur suggests Proto-Italic primary 
*-tro, *-ntro vs. secondary *-tor, *-ntor. Likewise, the Old Irish deponent 
endings 3sg. -fhir, 3pl. -tir point to *-tr-, *-ntr-, probably *-tro, *-ntro. 
Finally, Toch. -tär, -ntär cannot be derived from *-tor, *-ntor directly (cf. also 
Pinault 2010b). Ringe (1996: 86) discusses the change of *-or to Toch. *-ar, but 
the 3rd person middle endings are his only evidence, against counterexamples 
such as Toch.B malkwer ‘milk’, with suffix -wer < *-uor as in the verbal 
abstract, e.g. sesuwer ‘eating’. A further counterexample seems to be yerter 
‘felloe’, which on the evidence of the unpalatalised -t- must reflect *-tor.'* 

* The assumed replacement of well-marked middle paradigms ending in -r 
with the active marker -i is difficult to understand. What would be the 
motivation to do so? If endings are clearly marked to be primary, there 
seems no need to replace them. The greatest difficulty here is not the 
addition ofthe primary active marker -i — such additions are indeed found 
frequently in e.g. the perfect endings, such as OCS vede, Lat. vidi, or 
Toch.A kärse ‘I knew’ « *karsa-a-i — but the fact that the transparent 
middle ending *-r should have been deleted. 

In view of these problems, it is tempting to follow Kortlandt's reconstruction 

(1981) of the middle endings as *-to, *-nto only, without contrast between 

primary and secondary endings.'” Apparently such contrasts were created 

independently in the different branches. In any case, the problematic specifics 
of the reconstruction of the middle endings make them difficult to use for 
phylogeny. 

Another argument advanced by Ringe, Warnow & Taylor (2002: 117) is the 
thematic optative in *-o-i/;-, attested in Indo-Iranian, Greek, Balto-Slavic and 
Germanic, but not in Tocharian. Indeed, this may be a later innovation within 
Indo-European not shared by Tocharian. In Tocharian, there is only one variant 


14 A possible alternative reconstruction would be *-ewer with contraction of *ewe to e. 

!5 The evidence of Anatolian seems compatible with an original *-to, *-nto without contrast 
between primary and secondary endings: synchronically, they are attested in Hittite as pres.3sg. 
-tta, 3pl. -anta. However, a derivation from *-tor, *-ntor neatly explains the rise of the present 
particle -ri from resegmentation after the loss of -r after *-ó- (Yoshida 1990). The introduction 
of the particle -ti, to mark the preterite endings, i.e. 3sg. -ttati, 3pl. -antati, would be motivated in 
both scenarios. 
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of the optative suffix, -’i- (i with preceding palatalisation), to be derived 
from *-ih,-.'° However, “present optatives”, synchronically imperfects, are 
unattested in Tocharian A, and they must consequently have been regular- 
ised secondarily in Tocharian B (Peyrot 2012b). Therefore, it is difficult to 
prove that e.g. Toch.B pari* ‘he took" goes back directly to *bfer-ih,-t (for 
*b'er-o-ih;-t elsewhere). In any case, since the thematic optative is not 
attested in Italo-Celtic, it cannot be used to show that Tocharian split off 
before that branch. 

It has been argued that the combination of the Tocharian present participle 
in -mane with both active and middle finite inflection is an archaism: the 
verbal adjective *-mh,no- would originally have been indifferent for voice, 
very much like the *-nt-participle in Anatolian (Kloekhorst & Pronk 2019: 3), 
and became specialised only later, after Tocharian split off, as the middle 
counterpart of the active *-nt-participle (Pinault 2012: 229; Peyrot 2017: 
339-40). However, I now think that this argument has to be abandoned in 
the light of a study by Friis (2021), who shows that traces of specifically 
middle use are preserved, which suggests that active use of -mane in 
Tocharian is secondary." 

A case from word formation in the grammatical domain is the interrogative 
stem in *m- found in Anatolian and Tocharian (Hackstein 2004: 280-3; Pinault 
20102: 359; Peyrot 2019a: 195-9). A weak point of this argument is that the 
innovation of the other Indo-European languages would consist only in /oss of 
the m-interrogative, while a strong point is the central position of this stem, 
paired only with *k"i- (*k"e-, *k"o-), in the linguistic system. Thus, while the 
identifiability of this feature is low, its salience is nevertheless high. 


6.5.4 Lexicon 


Lexical evidence has been variously evaluated. Important papers adducing 
lexical evidence in support of an early split off of Tocharian are Schmidt 
1992 and Winter 1997. This evidence, and the method as a whole, was 
criticised by Hackstein (2005: 172) and Malzahn (2016) amongst others. For 
lexical arguments, a distinction should be made between lexical replacements 
and semantic change. 


16 The full grade variant **-ye- < *-ieh,- may have been ousted by the zero grade variant through 
paradigmatic levelling, but it is also possible that the zero grade variant was generalised from 
s-aorist optatives with *-ih,- throughout (if the synchronic optatives of root subjunctives of 
class 1, such as Toch.B parsi ‘may he ask’, are to be derived from s-aorist optatives, i.e. in this 
case *prék-s-ih;-t). 

17 Thus, even though I cannot agree with the arguments adduced by Fellner & Grestenberger 
(2018), I do now concur with their main claim. 
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Arguments based on lexical replacement are especially difficult because the 
identifiability requirement is not easily satisfied: it is hard to prove that two 
words did not carry the same or a similar meaning. An example of such an 
argument is Anatolian (i.e. Luvian) and Tocharian (i.e. Toch.A) *wel(H)- ‘die’ 
vs. *mer- elsewhere (Ringe, Warnow & Taylor 2002: 99).'* Although *mer- 
indeed acquired the meaning ‘die’ from ‘disappear’ after Indo-Anatolian 
(Kloekhorst & Pronk 2019: 3), and thus became a new word for the meaning 
‘die’, the Luvian and Tocharian A words cannot be shown to represent the 
original word for ‘die’, let alone that it was ousted by the new *mer- (see 
Malzahn 2016: 285—6). 

Another example is */A;eg"- ‘drink’, well attested in Tocharian and 
Anatolian, as against *peh;- elsewhere (Ringe, Warnow & Taylor 2002: 99). 
This may indeed be a case of lexical replacement, i.e. the meaning ‘drink’ came 
to be expressed by a different word. However, the details are complicated: Hitt. 
pas-' ‘swallow’ shows that *peh3- needs to be reconstructed for Proto-Indo- 
Anatolian, with possibly only a slightly different meaning; and Lat. ebrius 
‘drunk’ and Gr. vjgo ‘be sober’ show that *Ah;eg""- was preserved after 
Tocharian split off, possibly with a shift to ‘be drunk’ (Peyrot 2019b). Thus, 
the argument for lexical replacement remains fragile, while the best phylogen- 
etic evidence is formed by the possible semantic developments of ‘drink’ to ‘be 
drunk’ for */;eg""-, and ‘swallow’ to ‘drink’ for *peh3-. The attestation of the 
meaning ‘be drunk’ in Latin is favourable for the Indo-Tocharian hypothesis, 
because it suggests that this semantic change occurred after Tocharian split off, 
but before Italo-Celtic split off. 

As a lexical argument based on semantics, Ringe, Warnow & Taylor (2002: 
99) adduce *megh>, of which the Anatolian (e.g. Hitt. mekk-, mekki-) and 
Tocharian (e.g. Toch.B mäka) reflexes mean ‘much, many’, as against ‘great’ 
elsewhere. The distribution is especially neat in this case, since the etymon is 
also attested in Italo-Celtic (Olr. maige, Lat. magnus, etc.) and Germanic 
(Goth. mikils). Here, the main problem is the requirement of unidirectionality: 
the meanings are contingent and a change from ‘great’ to ‘much’ is by no 
means unlikely. 
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7 Italo-Celtic 


Michael Weiss 


7.1 Introduction 


Many scholars have noted similarities between Italic (Chapter 8) and Celtic 
(Chapter 9). Schleicher (1858) was the first to posit an Italo-Celtic node 
between Proto-Indo-European and Celtic and Italic.' But in the 1920s Carl 
Marstrander and Giacomo Devoto questioned the validity of this subgrouping.” 
Scholarly opinion has varied ever since. It would be fair to say that Italo-Celtic 
is more debatable than any other higher order subgrouping, certainly much 
more so than Balto-Slavic. 


7.2 Evidence for the Italo-Celtic Subgroup 


Many features once cited in favor of Italo-Celtic unity are now seen to be 
archaisms. For example, the medial r-endings (Lat. sequitur ~ Olr. sechithir 
‘follows’) were in the nineteenth century only known from Italic and Celtic, but 
the appearance of these endings in in Anatolian (Hitt. mid.3sg. -ttari), and 
Tocharian (Toch.B mid.3sg. -tär/-trä) completely changed this picture. It is 
true, however, that it is only in Italic and Celtic that -r becomes a marker of 
middle diathesis, and only Celtic and Latin have created a mid. pl. *-mor. In 
the other branches continuing *-r the suffix 1s limited to the primary middle 
endings only: Hittite prim. -tfari : sec. -ttati; Toch.B prim. -tär : sec. -te.* 
Another feature now known to be an archaism is the ¢-less 3rd singular medial 
endings: Olr. berair ‘is carried’, Umb. ferar subj.mid.3sg. These forms are 


But it is usually Lottner (1861) who is credited with first positing Italo-Celtic. In fact, Schleicher 
beat him to it by a few years. Schleicher mentioned the r-middle forms, the a-subjunctive, and the 
i-genitive as well as much other material that was just wrong. Lottner (1861) added the formation 
of the superlative. 

Devoto 1929; Marstrander 1929. Some key discussions of the issue of Italo-Celtic: Watkins 
1966; Campanile 1968; Cowgill 1970; Jasanoff 1997; Schrijver 2016; Zair 2018; see also 
Kortlandt 1981, 2007. 

Note, however, that in Old Irish for the 1st plural imperative of deponent verbs r-less forms occur 
in the glosses, e.g. seichem ‘sequamur’. See Thurneysen 1946: 37. 

But note that the secondary middle endings were not completely eliminated. Lat. 2sg. -re 
continues < *-so and Venetic continued -to as a pret.act.3sg. ending (donasto ‘gave’) 


N 
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matched by Hitt. -ari (esari ‘sits’) and relics in Vedic (áduha[t] ‘gave milk’). Of 
course, archaisms like this do not provide positive evidence for subgrouping, 
but they aren’t completely uninteresting either. In the case of the primary 
marker *-r, we may note that the nearest groups to the east, Proto-Germanic, 
Proto-Balto-Slavic, Albanian, and Greek have all taken part in the innovation 
of replacing primary middle -r with primary active -i (e.g. Goth. haitada ‘is 
called’ < *-otoi, Arc. Gr. -toı). The fact that the two most westerly branches 
escaped this innovation may not be fortuitous.” 

In the realm of phonology there are a small number of innovative features 
that have been proposed as shared Italo-Celtic developments, but these are all 
problematic. 

Both Italic and Celtic agree in the development of *CRHC to CRGC : Lat. 
granum ‘a grain’ < *grh,nom vs. Goth. kaurn < PGmc. *kurna-,^ Olr. lám ‘hand 
<*plh>meh;, but this apparent isogloss is complicated by fact that both Italic and 
Celtic show other outcomes for this sequence. In Italic *CRHC becomes CaRaC 
under the accent, e.g. palma ‘palm of the hand’ < *palama < *pÍh meh; (see 
Höfler 2017). In Celtic the outcome CRaC is found in a number of examples, 
which cannot be easily explained as morphological neo-zero-grades, e.g. Olr. 
flaith ‘rule’, MW gwlat *country'« * ulh,ti-." Itis difficult therefore to believe that 
the resolution of *CRHC sequences happened in Proto-Italo-Celtic. Note in 
particular the disagreement between MW gwreid ‘roots’ < *uradi < *urhdih; 
and the morphologically nearly identical Lat. radix ‘root’. 

A famous isogloss that does seem to hold up better is the long-distance 
assimilation of *p ... k* *k"... k” seen in Lat. quinque, Olr. cóic, OW pimp 
‘five’ < *k"enk"e < *pénk"e.? 

Latin quercus ‘oak’ < *k"erk"u- < *perk"u- (cf. Langobardic fereha ‘aescu- 
lus’, Goth. fairguni neut. ‘mountain’) seems to show that in Italic the assimila- 
tion *p ... kv >*k ... k” preceded the change of *k"u > *ku. But the Celtic 
place-name Hercynia *oak forest’ < *perkunia seems to show that in Celtic the 
* ku to ku change preceded *p ... k» > *k" ... kv. Since there was no *&" to 
trigger dissimilation *p developed regularly to Ø. This relative chronology, 


5 Proto-Balto-Slavic may have taken part in this innovation since the athematic (active) endings go 
back to i-diphthongs (OPr. asmai ‘I am’, assei ‘you are’), which may originate in the primary 
middle endings, though this is controversial. But note that Slavic has retained relic forms that 
could go back to *-or in OCS keZedo ‘everyone’ < *k"os + *eido(r) ‘is expected’ (Majer 2012: 
230) and OCS lubo ‘or’ < *leub"o(r) ‘is wanted’ (Majer 2015). For Albanian see Schumacher 
2016: 386. For the potential relevance of archaisms retained by adjacent languages see Watkins’s 
discussion (1966: 30). 

€ Olt. grán and the other Celtic forms might be loanwords from Latin. 

7 See Zair 2012: 69-89 for discussion. 

* The Sabellic form for ‘five’ was *pompe, but strictly speaking it is not possible to determine 
whether this is from *k"enk"e or *penk"e. Venetic also probably had this change, as it would have 
to if it is Italic, to judge from the Istrian ethnonym Quarqueni (Plin. 3. 130) ‘people of the oak 
forest’? 
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taken at face value, suggests that the Italic and Celtic long-distance assimila- 
tions were independent changes. If, however, the dissimilation of *k"u- to *ku- 
occurred already in Proto-Indo-European, as is likely, then one might suppose 
that the labiovelar had been analogically restored from an oblique stem form 
*perk"eu- in the dialects ancestral to Latin, in which case no inference about 
differing relative chronologies of the sound changes can be drawn.” 

In my 2009 book, I entertained the possibility that Italic and Celtic shared the 
change of *ü to *r before yod, sometimes called Thurneysen's Law. But Zair 
(2009) has shown that the Celtic facts are amenable to a different interpretation. 
The Old Irish word for ‘smoke’ de, gen. diad must go back to an immediate 
preform *diots, gen. diotos with a short i from earlier *d"uh»iots, *d'uhziotos. 
Zair explains this as *uh,iV- > *uiV > *iyV-. Fortson (2017: 838) argues therefore 
that Thurneysen's Law is a different phenomenon. But the whole complex of 
facts deserves more discussion than we can give it here. I limit myself to two 
observations. First, the forms of the verb ‘to be’ with an 7 reflecting *b’uh,-ie- 
cannot be explained by an Italo-Celtic rule (Lat. fio, Osc. fifet, Olr. biid, but MW 
byd points to a short *i) because these forms are also found in Germanic and 
Balto-Slavic (OE consuetudinal present bid, Lith. pret. 3ps. bit(i), OCS condi- 
tional bi).'° Second, while Latin is uninformative about the vowel quantity in 
prevocalic position, the Sabellic cognates of pius point unambiguously to a short 
i (Umb. pehatu, Pael. pes etc.).'' This raises the possibility that the development 
in Italic, like Celtic, was by way of a short vowel. 

In the realm of morphology we may note first the thematic genitive in *-7: Ogham 
Ir. MAQQI ‘son’, Gaul. SEGOMARI ‘Segomaros’, Lat. AISCOLAPI ‘Aesculapius’ .'” 
Although the building blocks of the *-7 genitive appear to be Proto-Indo-European 
(see Weiss 2020a: 204), the complete integration into the thematic nominal para- 
digm is uniquely Italic and Celtic. And yet this cannot have been a Proto-Italo-Celtic 
innovation. It is clear that the replacement of the inherited thematic gen.sg. *-osio 
happened in the individual Celtic and Italic languages. VOL *-osio is well repre- 
sented in Satrican VALESIOSIO and in Faliscan euotenosio. Lepontic -oiso is 
a probably transformation of *-osio under the influence of the pronominal 
gen.pl. *-oisom. This means that Latin and Celtic in the historical period have 
independently replaced an inherited ending with the same piece of morphology. 
This could hardly be a contact phenomenon. "° Most scholars agree that the origin 


? The paradigm of the word for oak must have preserved its second syllable labiovelar in some 
forms. Cf. Querquerni the name of a Celtic tribe of Gallaecia ‘people of the oak forest’. 

See Hill 2012 for these forms. Hill does not discuss the Italic forms. 

The Oscan form piíhiüí may be morphologically different (< *piiio-). 

On the Messapic genitive in -aihi, which is not related, see Weiss 2020a: 221, 494; Matzinger 
2019: 37. 

The first instance of -7 in Latin is from the fifth/fourth century BCE Muracci di Crepadosso in 
Latium (MORAI ESOM ‘I am of Morra.’) The first secure Celtic example is from the second 
century BCE. It's highly unlikely that the -7 morpheme could have been transferred from Latin to 
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of the -7 genitive is to be sought in the so-called vrkih suffix *-ih>, which makes 
substantives with genitival meaning from thematic nouns. The question then 
arises what function could the vrkih suffix have acquired in Italic and Celtic 
that made it a favorable candidate for eventually replacing the inherited thematic 
gen.sg.? Answering this question is difficult because we have no attested textual 
evidence from Italic or Celtic showing both the inherited genitive and the vrkih 
suffix. A necessary mid-stage for the transformation of the vrkih suffix-forms, 
which are substantives in Indo-Iranian, into an adnominal case form would be 
their use as adjectives. This would be another instance of the so-called weak 
adjective phenomenon in which an original substantivized form becomes an 
adjective. Could the reinterpretation of the vrkih suffix-forms as adjectives be 
the shared Italo-Celtic innovation that laid the groundwork for the eventual 
independent emergence of the i-genitive? 

The a-subjunctive: Olr. ‘bera ~ Lat. ferat ‘carry’. Both Italic and Old Irish 
display a morpheme d used to form the subjunctive. '* In Latin this makes the 
subjunctive to thematic present stems, but relic forms of Old Latin and Sabellic 
show derivation from the root (advenas, atulas, Umb. neifhabas). This must 
represent an old pattern. In Old Irish the a-subjunctive is formed to weak 
presents and strong presents ending in b, r, I, m, and n plus agaid.'” Class 
S 3 (nasal infix presents to set root) affix the suffix to the root with no nasal 
infix (benaid — bia). There are two schools of thought on the Italo-Celtic or 
Italic and Celtic a-subjunctive. One view, the traditional one, identifies the 
morphemes of the two language families. The other view, originating with 
Rix (1977) and significantly improved by McCone (1991), derives the 
Insular Celtic a-subjunctive from *-ase-, either the desiderative morpheme 
*-h ,se- (Rix) or s-aorist subjunctive morpheme added to laryngeal final roots 
(McCone). The advantage of the McCone view is that it allows both Old 
Irish subjunctives to be derived from a single Proto-Indo-European category. 
But the disadvantage is that the starting point for the a-subjunctive on this 
hypothesis would be the s-aorist subjunctive built to set roots; such 
a category, which is very sparsely attested in other Indo-European lan- 
guages, would have to have become very successful in the prehistory of 
Celtic. 

The superlative formant *-ismmo-: Olr. tressam ‘strongest’ « *treksisa- 
mos, MW hynaf oldest? < *senisamos, Lat. maximus ‘greatest’ < *magisVmos, 


Gaulish and then from Gaulish to the ancestor of the Insular Celtic languages, which were 
already on the British Isles by this time. 

1^ The oft-cited Tocharian class V d-subjunctive (Toch.A wekas ‘will disappear’, Toch.B märsam 
‘will forget’) does not belong with the Italic and Celtic forms. PIE *a becomes CToch. *a 
(Toch.A a, Toch.B o, e.g. Toch.A pracar, Toch.B procer ‘brother’ < *b^rater < *b^reh»ter). See 
Jasanoff 1994: 206-7. 

1^ Strong presents ending in a velar and dental form the subjunctive with -s-. 
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Pre-Samnite roAaıovuog ‘best’ (see Cowgill 1970). Even strong opponents of 
Italo-Celtic like Marstrander admit the strikingness of this agreement. 
Marstrander (1929: 246) wrote: 


Une forme tout à fait identique comme irl. nessam, osque nessimo- doit provenir 
d'une méme source primitive; on ne saurait guére admettre qu'elle se soit développée 
indépendamment dans les deux langues. Mais il n'en suit pas nécessairement qu'elle ait 
pris naissance à un époque d’unite italo-celtique. 

[An absolutely identical form like Olr. nessam, Osc. nessimo must derive from the 
same original source; it would hardly be possible to accept that it had developed 
independently in the two languages. But it does not necessarily follow that it arose in 
an era of Italo-Celtic unity.] 


Marstrander thought the proto-form of the superlative suffix was *-smmo- and of 
“haute antiquité” [remote antiquity”], hence a shared inheritance. But we know 
today, thanks to Warren Cowgill, that the proto-form was in fact *-ismmo- and it 
is certain that *-ismmo- replaces the earlier superlative formant *-isto- continued 
by Greek, Indo-Iranian, and Germanic, which was inherited into Italic as traces 
like iuxta ‘nearest’ and probably ioviste ‘youngest’ and solistimus ‘most 
favorable’ show.'® Furthermore *-isto- could have been remade as 
*-ismmo- under the influence of the well-attested suffix superlative *-mmo-, 
which is normally added to pronominal and adverbial stems. But on what 
basis could a theoretical archaism *-ismmo- be remade to *-isto-, since the 
suffix -to- would not otherwise occur as a superlative formant? The superla- 
tive formant *-ismmo- seems the strongest argument for Italo-Celtic. It 
should be noted, by the way, that the same formant is continued in (para-) 
Venetic (VENIXEMA from Emona), but this is unproblematic if one believes, as 
I do, that Venetic was an Italic language. 

Primary 3rd person middle endings *-tro, *-ntro: Olr. do.moinethar 
‘thinks’, Umb. herter ‘should’ < *her(i)tro.'’ The ending *-ntro results from 
a contamination of *-ntor and *-ro and the innovation spread from the 3rd 
plural to the 3rd singular. This innovation did not succeed in completely ousting 


16 For possible traces of the superlative suffix *-isto- in Celtic personal names, see Prosper 2018: 
128-9. Some reconstruct a laryngeal after the *t because of Ved. -istha-. 

The source for the Old Irish deponent 3rd singular and plural endings and the Umbrian primary 
middle endings must be reconstructed as *-trV, *-ntrV. If the ending had been *-tor, a pre-Olr. 
*sekitor ‘follows’ would have syncopated the medial vowel. The attested form sechithir points 
to an immediate preform *sekitr. See Thurneysen 1946: 367 and Jasanoff 1997. Final *-(n)tro 
in Umbrian and Oscan became [ter]. In Oscan the new vowel merged with old *e and is 
consistently written with e. In Umbrian the vowel merged with the reflex of short 7 and is 
written with e in the Umbrian alphabet and e, i, or ei in the Latin alphabet. Meiser (1986: 112) 
champions Ebel’s suggestion to derive the forms from *-ti-r and *-ntir with an r tacked on to the 
primary active personal endings, but this is unnecessary because there is just not enough 
evidence to show that the outcome of final *Cros was anything different. On Umb. ocar, 
which is from *okaris not *okris, see Weiss 2013: 349. 


17 
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*-tor and *-ntor in either Italic or Sabellic. In any case, there is no evidence for 
this contamination elsewhere in Indo-European. 

Ata much lower level of importance are the many shared lexical items, since 
content words can be easily borrowed. Nevertheless, some of these items show 
striking morphological and semantic specializations. Some examples follow. 

Lat. crispus ‘curly’, MW crych, Gallo-Lat. PN Crixsus continue 
a proto-form *kripso- from the root *kreip- ‘turn’ found also in 
Balto-Slavic (OCS kres» ‘solstice’, Lith. kreipti ‘to turn’). The 
Italic, Celtic, and Slavic forms presuppose an s-stem *kreipos 
‘turning’. In Proto-Italo-Celtic the s-stem made a thematic deriva- 
tive, which, in the most archaic fashion, triggered a double zero- 
grade of the pre-suffixal stem. The meaning ‘having turning’ was 
specialized to ‘curly’ and ‘wrinkled’, both meanings attested in 
Welsh and Latin.'* 

Lat. deses, désidis ‘lazy’, ‘inactive’, Olr. deeid < *de-sed(i)-. The 
Latin adjective, which is not attested before Livy, has been sus- 
pected of being backformed from désidia ‘idleness’ (Plautus +), but 
the close match with the Irish adjective makes this unlikely. The 
Irish and the Latin form presuppose a semantic development *de/ 
deh, + *sed- ‘to remain seated’ (cf. Lat. desideo) > ‘to be idle’. 

Lat. saeculum ‘lifespan’, MW hoedl ‘lifetime’ < *saitlom < *sehzitlom, 
Gaul. deae setloceniae < *saitlokeiniio- ‘goddess of long life’ 
(cf. Olr. cian ‘long’). This match is perfect and, if correctly 
derived from the root *sehzi- ‘bind’, shows a striking semantic 
development. The oldest recoverable meaning for both hoedl 
and saeculum is ‘lifespan’. Thus in early Rome, according to 
Etruscan belief, a saeculum extended from some important date 
like the founding of Rome until the last person alive at that 
initial time died. This meaning could have arisen from the idea 
of a binding knot, marking the ends of life. Cf. Ved. párur- ~ 
párvan- which means ‘a knot’, ‘a limit’ and also ‘a fixed period 
of time’. 

Lat. de *down from', Olr. di, OW di. This preposition, probably the 
instrumental *deh; of a pronominal stem *do-, has no precise 
matches outside of Italic and Celtic. Though just a little word, 
*dehj's import is considerable since it is part of a relatively small 
set of quasi-functional prepositions. 


18 De Vaan (2008: 145) prefers a proto-from *krispo- which is equally possible on the grounds that 
*kris- is attested in Latin in crinis ‘hair’ and crista ‘crest’, but neither crispus nor crych is 
exclusively a descriptor of hair, and it is easier to explain an -s- as a remnant of an old s-stem 
than a -p- as a root extension. 
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Lat. Semo, a god of the oath often associated with Hercules, and Osc. 
seemün- match Gaul. Segomon-, an epithet of Mars. These forms 
converge on a Proto-Italic epithet *seg^o-mo, -mon- 'strong-man', 
a secondary -mon-stem from a thematic stem *seg^o- ‘strength’ 
(MIr. segh). The form *seg"o-mo seems to have been a divine 
epithet found nowhere but in Italic and Celtic (see Weiss 2017a). 

Whether one recognizes an Italo-Celtic node or not, the fact remains 
that Italic shares more innovative features with Celtic than with any other 
branch. ? Nevertheless, it should not be forgotten that both Italic and 
Celtic individually and in common share many features with Germanic. 
This connection is not surprising given their geographical positions (see 
Weiss 2020a: 500-1). Somewhat more surprising are some striking agree- 
ments between Italic and/or Celtic and Indo-Iranian, famously highlighted 
by Vendryes. The phylogenetic import of these agreements is still unclear 
(see Weiss 2020b). 


73 The Position of Italo-Celtic” 


The relationship of Italo-Celtic to the rest of Indo-European can be conceived 
of as the answer to three questions. (1) Was Proto-Italo-Celtic the next clade to 
separate from the PIE tree after the separation of Proto-Tocharian? (2) How do 
we interpret the extensive lexical matches between Italic, Celtic, and the other 
northern Indo-European branches, Germanic and Balto-Slavic, the so-called 
vocabulary of the northwest? (3) What do we make of the striking matches, 


1? For a determined attempt to undermine the plausibility of Proto-Italo-Celtic from the phono- 
logical side, see Isaac 2007: 75-95. His argumentation is based on very specific possible 
formulations of the sound changes and, consequently, relative chronologies which, in my 
opinion, either can be formulated differently or cannot be stated with sufficient certainty. For 
example, Isaac relies heavily on the failure of the word for *yesterday' to fall together with the 
reflect of “thorn” clusters in Italic (*g’d"(i)es- > Lat. heri). If the metathesis of TK to KT is Proto- 
Italic-Celtic (or earlier) *g'd'es would have to become *d"g"(i)es. But if this is the case, then 
how did the Latin form escape the normal treatment of such clusters in Latin to initial s- (situs 
‘decay’ < *d'g""itu-). One solution, Isaac suggests, is to posit a simplification of *g'd'^(i)es to 
*g"(i)es in Proto-Italic but not in Proto-Celtic where the outcomes with d (Olr. indé, W doe) 
show that this simplification could not have applied. This difference would necessarily mean 
that Proto-Italic was divergent from Proto-Celtic at this point and the metathesis, if shared by Italic 
and Celtic, would be a diffused or independent event. By Isaac's chronology there would then be 
no unique phonological innovations shared between Italic and Celtic predating this divergence. 
But this assumes that the thorn cluster development was the result of simple metathesis. In fact, 
what if, as argued by Jasanoff (2018), the key to the thorn cluster development was spontaneous 
palatalization in TK clusters with subsequent metathesis? i.e. TK > TK’ > KT. If this was the 
development, then there is no necessity for KT to have the same development as KTi. In some 
languages these might have merged and in others, including Latin, they did not. 

For the sake of this exposition, I will take the validity of the Proto-Italo-Celtic subgroup for 
granted. 


20 
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especially in the religious and legal lexicon, shared by Italo-Celtic and Indo- 
Iranian? 

That Proto-Italo-Celtic was the next group to branch off after Proto- 
Tocharian has been supported by some computational phylogenies of Indo- 
European (see Figure 7.1) but not others.*! To show that Proto-Italo-Celtic was 
the next to branch off would require demonstrating the existence of innovations 
shared by all the other non-Anatolian, non-Tocharian branches that are not 
found in Proto-Italo-Celtic. The best candidate for an innovation of this sort is 
the thematic optative *-o-ih;- of which there is no certain trace in Italic or 
Celtic, while it is well represented, or at least traceable, in Germanic, Balto- 
Slavic, Indo-Iranian, Greek, Armenian, Phrygian, and Messapic.” In place of 
the thematic optative, on the view followed here, Italic and Celtic show the *a- 
subjunctive. Another possible innovation of the inner branches is the replace- 
ment of the primary middle marker *-r by *-i, which is seen in Greek, 
Phrygian,” Indo-Iranian, Germanic, Albanian, and possibly Balto-Slavic. 


Balto-Slavic 
Indo-Iranian 
Albanian 
Armenian 
Greek 


Germanic 


Celtic 


—— Italo-Celtic Pa 


Indo-Tocharian ——————————————- Tocharian 


Italic 


Indo-European Anatolian 


Figure 7.1 Tentative tree showing the position of Italo-Celtic 


?! This is the finding of Ringe, Warnow & Taylor 2002, but it is not supported by the Chang et al. 
2015 tree. 

?? There is no indisputable evidence for the retention of the thematic optative in Albanian, but, 
given its advanced state of development at time of first attestation, this is not too surprising. 

?3 Old Phrygian has only -toi. New Phrygian has two instances of a 3sg. sequence -tor. It's not clear 
that these are to be compared with the r-middle forms of Anatolian, Tocharian, and Italo-Celtic. 
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These two potential isoglosses seem to constitute the total evidence for innov- 
ations not reaching Proto-Italo-Celtic. 

At the same time, it is clear that Proto-Italo-Celtic was in close contact with 
the rest of the northwestern Indo-European branches. Meillet (1922) famously 
identified a long series of lexical items shared between Italic, Celtic, Germanic, 
Baltic, and Slavic that found no matches in the other IE languages (cf. also 
Oettinger 2003). With greater knowledge of Anatolian, Tocharian, and the later 
Iranian languages, some of these supposedly exclusive items must be reeval- 
uated. For example, the root *seh,- ‘sow’ (Lat. semen ‘seed’, Olr. sil, OHG 
samo, OCS séme ‘seed’, Lith. séti ‘to sow’) now has a cognate in Hitt. sai, 
Siyanzi ‘to press’. The item *se/h;- must be reconstructed for highest node PIE, 
but the specialization to ‘sow’ is still only found in the northwest. On the other 
hand, Meillet’s example *porkos ‘piglet’ (Lat. porcus, OHG farah, Lith. 
parsas, CS prase) is no longer valid since a cognate is attested in Iranian 
(YAv. parsa-, Khot. pasa, etc.). 

Nevertheless, there are still many items with a northwestern distribution. 
Some of these might be common or independent borrowings from substratal 
languages. This scenario is especially plausible for the names of flora and fauna. 
An example of this sort might be ‘alder’. The cognates for this word show 
a remarkable amount of formal variation that is difficult to trace back to exclu- 
sively Indo-European morphophonology: Lat. alnus < *alsno-; PGmc. *aliso 
(ODu. elis in place-names; MDu. else, Sp. aliso) — *alizo (OHG elira) — *aluz- 
(ON oir, OE alor); Lith. aliksnis alksnis, elksnis; PSI. *olexa (Ru. ol’xd) ~ *eloxa 
(Ru. dial. elxa, Bulg. elxa) ~ *olesa (Cz. olše) ~ *elisa (SCr. jélsa). Cf. Basque 
haltz. The word may, however, also show up in Macedonian ica (Hsch.) 
glossed as ‘poplar’. 

Two terms relating to agricultural technology with somewhat overlapping 
meanings are (1) */(V)ih,seh> ‘furrow, track’ (Lat. lira ‘furrow’ < *leih,seh;; 
OPr. lyso ‘field’ < *lih;seh», cf. Lith. lysé <*lih,siieh2; OCS léxa ‘row’, OHG 
-leisa ‘track’ «*loih,seh; ^) and (2) *polkeh; ‘ploughed piece of land’ (OE 
fealh ‘ploughed land’, Gaul. *olca ‘arable land’ (Gregory of Tours olca, OFr. 
ouche, Port. olga), ORu. polosa ‘strip of land’). In Latin, Germanic, and 
Slavic the root *plek- ‘plait’ has acquired a -t-extension: Lat. plectere, OHG 
flehtan, OCS pletg. Contrast the unextended *plek- in Lat. ex-plicere and Gk. 
rAEK@. A piece of military technology is reflected by the word for ‘shield’: 
Lat. scütum, OPr. staytan for *skaitan < *skoitom vs. Olr. scíath, MW ysgwyd, 
OCS štite < *skeitom. 

There are a number of words relating to social structure. Most famous is the 
word *teuteh; ‘people’ (Osc. touta, Goth. biuda, Olr. tuath, Lith. tauta). And in 


24 Whether these forms are further connected with the root */eis- ‘learn’ (LIV? 409) is doubtful, but 
in any case, the agricultural meaning is a share feature of the northwest. 
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quasi-opposition to *teuteh> is *g’ostis ‘guest-friend’ (Lat. hostis, Ven. hosti- 
hauos, Goth. gasts, OCS goste). From the legal sphere we have *d'elg^- ‘owe’ 
(Olr. dligid ‘is owed’, Olr. dliged ‘law’, Goth. dulgs ‘debt’, OCS dlog» ‘debt’, 
though the Slavic forms might be a loan from Gothic) and *uad'- ‘surety’ (Lat. 
vas, vadis, Osc. vaamunim ‘vadimonium’ < *uafemdniiom, Goth. wadi 
‘pledge, surety’, Lith. vädas ‘surety’ (obsolete)).^ 

Finally, it’s been observed since Vendryes 1918 that Italo-Celtic and Indo- 
Iranian share a number of culturally important words relating to the religio-legal 
sphere not occurring in the intervening languages. The most notable of these are 
the words *h3réés ‘rule’, ‘king’ (Olt. ri, Lat. rex, Ved. raf) and *kred(s)-d'eh;- 
‘to trust’, lit. ‘place heart’ (Olr. creitid, Lat. credere, Ved. sraddhd ‘trust’). 
Vendryes regarded these agreements as archaisms that were discarded in the 
intermediate languages, but it is striking that the supposed archaic status of these 
items is not confirmed by evidence from Proto-Anatolian or Proto-Tocharian.”° 
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8 Italic 


Michael Weiss 


8.1 Introduction 


The Italian peninsula before the Roman conquest was home to a large number 

of languages, both Indo-European and non-Indo-European.' Among these 

languages, the following have been thought to descend from a common ances- 

tor, Proto-Italic (cf. Figure 8.1). 

1. Latin, spoken in Latium in a number of slightly divergent dialects for most of 
which we have only scant information from inscriptions and glosses. The 
Latin of Praeneste, which is the findspot for the two earliest Latin inscrip- 
tions, and the Latin of Falerii are reasonably well attested in inscriptions 
dating from the seventh to second century BCE. The Latin of Falerii is often 
classified as an independent language called Faliscan, though this is not 
justified on linguistic grounds.? But towering above all is the Latin of Rome. 
In this language we have a small number of inscriptions from the seventh- 
sixth centuries BCE in a distinctively archaic form, which I call Very Old 
Latin. After slowing to a trickle in the fifth century, Latin inscriptions pick up 
again in the fourth century and are joined by literary documents in the third 
century. The Latin of Rome spread first to Italy, suppressing the previously 
existing linguistic diversity, and then to most of Western Europe, North 
Africa, and southeastern Europe north of the Jireček line.” Roman Latin 
survives today in its multiple descendants, the Romance languages. 

2. The Sabellic languages. These languages, which form an as yet unques- 
tioned subgroup, are 
a. Oscan, the language of the Samnites of central and southern Italy, who 

also expanded into Campania and Sicily, is represented by about 800 
inscriptions dating from the mid-fourth century BCE to perhaps as late as 
the first century CE 


! See Weiss 2020: 15-18 for a survey. 

? On Faliscan see Bakkum 2009. There are about 360 linguistically informative Faliscan inscrip- 
tions dating from the sixth to second centuries BCE. On Praenestine, see Franchi de Bellis 2005. 

? An imaginary line drawn by the historian Konstantin Jireček marking the southern extent of Latin 
influence in southeast Europe. 
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Figure 8.1 The Italic languages 


b. Umbrian, known chiefly from the Iguvine Tables from Gubbio (third— 
second century BCE; see Weiss 2010) and about forty smaller inscrip- 
tions, a few as early as the seventh century 

c. South Picene,* the language of fewer than thirty inscriptions from the 
Marche and Abruzzo dating from the sixth-third centuries BCE 

d. Pre-Samnite, the language of inscriptions from Campania before the 
Samnite conquest in the fifth century; the longest document is the Cippus 
of Tortora from the sixth-fifth centuries 

e. In addition, there are a number of short texts in the dialects of the Volsci, 
Marsi, Paeligni, Marrucini, Vestini, and Hernici.° We also have a number 
of Sabellic loanwords in Latin (bos ‘cow’ for expected *(w)iis < *g"ous 
being the most prominent of them). ’ 

3. Venetic, attested in more than 400 inscriptions from the northeast corner of 
Italy from the sixth to first centuries BCE. Some documents have been 
discovered in neighboring Slovenia and Austria. Not all scholars would 
agree that Venetic is an Italic language. 

4. Sicel, the language of a small number (fewer than thirty) of pre-Greek inscrip- 
tions of eastern Sicily from the sixth to fourth centuries BCE and a number of 
glosses." It is very difficult to determine much about this language beyond that 
it was Indo-European, as the form pibe ‘drink? = Ved. piba shows 


* See Zamponi 2021 for a survey of the evidence. 

But see Clackson 2015: 26—7 who questions Rix's idea of uniting these texts and some inscrip- 
tions of Lucania as a unitary language. Adiego (2015) prefers Opic for the language of these 
inscriptions, which must have coexisted with Oscan for some time after the Samnite invasion of 
Campania. 

There is no space to discuss the internal subgrouping of Sabellic, which is also not uncontrover- 
sial. See Clackson 2015 and Fortson 2017: 847-51. 

See Poccetti 2017 for more detail. 

On Sicel, see Ambrosini 1984, Agostiniani 1992, Campanile 1969, Willi 2008: 341-8, Poccetti 
2012, and Hartmann 2018. 


a 


o x 
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conclusively.? There are a few items in Sicilian Doric Greek that seem to match 
Latin and that are suspected of being of Sicel origin, e.g. lítra ~ Lat. libra 
‘pound’, kubiton ~ Lat. cubitum ‘elbow’. The few inscriptions that are longer 
may show some Italic lexical material such as Mendolito geped ‘had’'° with 
a simple perfect comparable to Osc. hipid, Grammichele dedaxed ‘made’ (?) 
(see Machajdíková 2018: 151), perhaps with a reduplicated k-extended form of 
the root *d’eh,- like VOL fhe:fhaked, Osc. fefacid, or the female name Kup(a) 
ra, which recalls Sabellic *kupro- ‘good’. If Sicel is Italic, it would diverge 
from all other members in showing voiced reflexes of the PIE voiced aspirates 
in initial position in contrast to the f< *q^, *b^ and h- < *e"/*$8^ seen in Latin, 
Sabellic, and Venetic. 


8.2 Evidence for the Italic Branch 


Positing Proto-Italic as the superordinate node of Latin, Venetic, and Sabellic is not 
uncontroversial, though it is supported by substantial phonological and morpho- 
logical evidence: the merger of *b’- and *q'- as */-,'' the gerundive in *-nd-, the ipf. 
subj. *-sé-, the ipf. *-ja- (the more probative morphological features are unattested 
in the fragmentary Venetic corpus). Proto-Italic was recognized as a node from the 
start of the serious scientific investigation of the Indo-European languages. But 
some scholars beginning with Walde (1917), Muller (1926), and Devoto (1929) 
have challenged this assumption and argued instead that Italic and Sabellic are two 
separate branches that have undergone a secondary process of convergence. '? 
And indeed, there is no doubt that much convergence has happened between 
Latin and Sabellic. For example, the change of intervocalic *-z- to -r-, called 
rhotacism, affects both Latin and Umbrian but not Oscan and can be shown to 
have happened long after the initial separation of both languages. In Latin the 
change happened sometime in the fourth century, and the Umbrian change may 
have happened around the same time. Initial di- is eventually simplified in 
Latin, Umbrian, and Oscan to ;- (except in Bantia), but again these changes 
happened within the historical record for Oscan and Latin at least. Both Latin 
and Sabellic show deletion of the final primary marker *-i in the 1sg., 2sg., 3sg., 


? Ona kylix from Aidone. See Lejeune 1990. 

10 For an ingenious attempt to make sense of this text, see Martzloff 2011. 

!! But if, as I argued in Weiss 20182 (following Walde 1906), *d'rag!- ‘drag’ > *drag'- > *trag'- > 
trah-ere by Limited Latin Grassmann's Law and the change of *dr- to tr-, and if Limited Grassmann's 
Law is only Latin, then there would be evidence that reflexes of *b and *d’- did not fall together in 
initial position in Italic so that the f- from both *b’ and *d"- would have to be a diffused trait. The 
problem is that we can't determine what the date of Limited Grassmann's Law is, due to the lack of 
Sabellic evidence. It could be ordered very early in Proto-Italic before the devoicing of voiced 
aspirates or even theoretically in Proto-Italo-Celtic times. 

A thorough survey of the arguments up until 1950 is provided by Diver's unpublished disserta- 
tion of 1953. Further important works are Jones 1951, Safarewicz 1963, Beeler 1966, and 
Campanile 1968. 
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and 3pl. (Meiser 1998: 74 after Rix 1996: 158). But the survival of an 
unapocopated final -i in tremonti in the Carmen Saliare makes it unlikely that 
this apocope dates to Proto-Italic times. The Carmen Saliare is old, but not that 
old, and the text has a specifically Latin form of the acc. pronoun feft and so 
could not be “Proto-Italic”. Instead, the apocope must be a diffused change. "°? 

But while it is easy to show some degree of phonological convergence and, 
of course, lexical interchange and syntactic influence within the historical 
period, I know of no case of a Sabellic morpheme being adopted into Latin 
or vice versa. We have no v-perfects in Sabellic, no -tt-perfects in Latin, no 
Latin infinitives in -om, no Latin athematic nom.pl. in -s, and so on. This 
difference between phonological, lexical, and syntactic permeability vs. mor- 
phological impermeability 1s not surprising: morphology is known to be more 
resistant to borrowing, but the absence of morphological borrowing within the 
attested timeframe, in a period when the Sabellic and Latin languages were in 
intense contact, should strengthen our confidence in the value of shared morph- 
ology for establishing the Proto-Italic subgroup. 

There are a number of shared phonological developments unique to the Italic 
languages that cannot be shown to be the result of convergence and thus are 
good candidates for defining innovations of Proto-Italic. The difficult issue is 
deciding whether they are non-trivial. First on this list is the development ofthe 
PIE voiced aspirates. In initial position PIE *d* and *b* developed to f and 
*6'/g to h in Latin, Sabellic, and Venetic: *b’uh,- > Lat. fu-i ‘I was’, Osc. fu-st, 
Umb. fu-st ‘will be’; Transponat *d'h;k- ‘make’ > Lat. facio, Osc. fakiiad, 
Umb. facia subj.3sg., Ven. vhagsto pret.3sg.; *$g^orto- ‘enclosure’ > Lat. 
hortus ‘garden’, Osc. hürz; Ven. hosti- < *g’osti- ‘guest’. In medial position, 
on the other hand, the voiced aspirates became voiced fricatives, and the labial 
and dental fricatives were not merged:'* Lat. nebula ‘cloud’ < *neßVla < 
*neb^Vleh;, cf. Ved. nábhas- ‘cloud’; aedes ‘temple’ < *A;aid/- ‘burn’, cf. 
Gr. aitóuevoc ‘burning’; Lat. Samnium Osc. safinim by anaptyxis from *saf- 
nim and Gr. Xabviov point to a Sabellic *safiniio-;'^ Ven. louderobos ‘children’ 
dat.pl. — *A;leud'erob^os. The velar fricative went on to become h everywhere 
except Faliscan where it hardened to g: *meg"ei ‘me’ dat.sg. > Lat. mihi, Umb. 
mehe, *leg'eti > Fal. lecet /leget/ ‘lies’, cf. Goth. ligan.'° The combination of 
devoicing, merger of *d^ and *5" in fin initial position and voicing in medial 
position (whether this voicing directly continues the voicing of the voiced 


For more on diffused changes, see Weiss 2020: 496—7. For what it is worth, Sicel appears to 
have an unapocopated esti. 

Though eventually they did merge in Sabellic and Faliscan. 

Gr. Xabviov, which must have been borrowed form a Sabellic form before anaptyxis, shows that 
the Greeks heard the fricative spelled by the Oscans with f as a voiced sound. 

The reflex of *g" was still an obstruent in Umbrian at the time of medial syllable syncope 
because when this sound comes together with ¢ by syncope it develops to j in the same way as k, 
e.g. *-uegetod > *-ueyetod > afveitu ‘bring’ parallel to *fakitöd > feitu. 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


118 Michael Weiss 


aspirates or is a revoicing) is a set of developments that is found in no other 
Indo-European branch." 

A set of sound changes that are certainly shared by Sabellic and Latin is the 
absorption of a short vowel after yod in a medial syllable. This sound change led to 
the creation of the 3rd io-type in Latin and its analog in Sabellic, e.g. *kapiesi “you 
(sg.) take’ > *kapisi> capis. Cf. Osc. factud (Lu 1.9) fut.ipv.sg. ‘make’ < *fakitod « 
*fakietod. After a base of more than one mora, there was an epenthesis of i before 
the yod prior to absorption of the e: *sent-iesi > *sent-iie-si “you (sg.) feel’ > *sent- 
isi > sent-is. Cf. Umb. amparitu fut.ipv.sg. ‘raise’ < *am-par-iie-tod.'* The sound 
changes that produced this system appear to be quite early since they predate the 
resolution of syllabic sonorant consonants (see Fortson 2018), but, on the other 
hand, these sound changes appear to be distinct from similar changes in Germanic 
and Celtic.'” 

Another interesting phonological development is the outcome of *mmV. This 
sequence first arose by the loss of a laryngeal or by Lindeman's Law. It is also 
found in the ordinal and superlative suffix *-mmo-, which is of uncertain 
analysis. In Latin and Venetic the supporting vowel is o, e.g. *e^mmo > Lat. 
homo ‘man’, cf. Goth. guma, *dekmmos 7 Ven. dekomei loc.sg., cf. Celtib. 
tekametam ‘tenth’. In Sabellic the outcome appears to have originally been u. 
The best evidence for u is Osc. ültiumam ‘last’, Palaco-Umbrian setums (a 
personal name, lit. ‘seventh’) < *septmmos, the Pre-Samnite superlative 
FoAaıovuos ‘best’ nom.pl.”° It's not certain what the Proto-Italic state was. 
We can certainly exclude *um since that would not be lowered to om in Latin, 
cf. tumor ‘swollen condition’, gumia ‘glutton’. It’s conceivable that *om would 
have been raised to um in a medial syllable in Sabellic, but there is no 
independent evidence for such a change. Rather than miss the generalization 
that the Italic languages uniquely have a rounded vowel as the reflex of 


17 But note that if Sicel is Italic, and if the evidence is correctly interpreted as showing that Sicel 
had voiced reflexes in initial position, this isogloss would have to be interpreted differently. 
Either the initial PIE voiced aspirates first became voiced fricatives that were retained or became 
stops in Sicel and were devoiced in the rest of Italic, or the initial voiced aspirates became 
voiceless fricatives which were then voiced in Sicel in all positions (cf. the famous southern 
British English from which the standard dialect borrowed vat and vixen). Whichever interpret- 
ation is correct, the Italic developments would still be unique. 

The sound changes that produced the 3rd io ~ 4th conjugation contrast are attested outside of 
verbal morphology, e.g. *diieuiio- ‘heavenly’ > *diuiio- (Osc. diiviai), so it is uneconomical to 
set up athematic i-inflection for present stems which are functionally identical to the *ie-/ 
io-presents of other branches. 

It is attractive to derive the endings of Old Irish i-stem verbs (absolute gaibid, conjunct ‘gaib 
‘takes’) from an immediate preform *-iti < *-ieti. But there are Gaulish forms that appear to 
show the unreduced sequence -ie- (bissiet ‘will be’), and the Sieversish distribution seen in 
Sabellic does not hold in Old Irish, e.g. bruinnid, -bruinn ‘flows forth’ with an S 2 inflection 
after a heavy base. 

The Oscan form humuns ‘men’ and Umb. homonus dat.pl. are ambiguous. humuns is written in 
an alphabet that does not distinguish u and o and Umbrian lowers u to o before m. 


19 
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prevocalic *m which contrasts with its development in preconsonantal position, 
it may be preferable to reconstruct *em with a close mid central unrounded 
vowel that merged with either o (in Latin and Venetic) or u in Sabellic.”' There 
are a number of other phonological features that could be mentioned, but they 
are all problematic in one way or another.” The phonological innovations are 
admittedly not many, but they are indicative of a subgroup. 

It is the shared morphological innovations, which, in my opinion and the 
opinion of most experts, make the existence of a Proto-Italic unavoidable. 
The Italic languages share a new verbal adjective, the gerundive, with the suffix 
-ndo- in Latin and *-nno- in Sabellic (Osc. ápsannam ‘to be constructed’, Umb. 
ocrer pihaner ‘to purify the city’). The origin of this form and the synchronically 
related gerund, not attested in Sabellic, are much debated. The original function of 
the form seems to have been quite similar to a middle participle as we can see in 
synchronically isolated cases such as Lat. secundus ‘following’ ~ sequor “I 
follow’, oriundus ‘arising’ ~ orior ‘I rise’, but the semantic development to 
a verbal adjective of necessity is found in both Sabellic and Latin. Whatever its 
origin, the gerundive has no analogs outside of Latin and Sabellic. 

The imperfect subjunctive in *-se-, e.g. Osc. fusíd = Lat. ‘foret’, Lat. es-se-s 
‘be’ ipf.subj.2sg., is another morpheme of disputed origin.” It does not have 
any comparanda outside of Italic.” The category is not attested in Umbrian or 
Venetic. But beyond the existence of an identical morpheme for this category, it 
Is worth noting that a subjunctive system with present, imperfect, perfect, and 
presumably pluperfect is a uniquely Italic way of organizing the verb. 

Another Italic-only verbal exponent is the imperfect indicative morpheme 
*-pa- (Lat. dücebas “you were leading’, Osc. fufans ‘they were’, Vest. profafa- 
ie. = Lat. probäbä- ‘was approving’).”° It is generally agreed that *-fa- is 


2 


Alternatively, one could suppose that pre-vocalic syllabic *m was preserved in Proto-Italic and 
then developed in slightly different ways in the daughter languages. In this way one loses the 
generalization that all Italic languages show a rounded vowel in this environment, but it is not 
too shocking that a prop vowel should develop to a rounded vowel before a labial. A distinct 
syllabic *m arose in the 1sg. of the verb ‘to be’: *esmi became *esm by the loss of the final 
primary marker *-i, and this developed to esom in VOL (Garigliano EsoM), but there is quite 
a bit of variation here. Latin itself attests sim, said to be Augustus’ favored form, and Sabellic 
has the same two variants *som (Osc. sim) and *sim (Pre-Samn. and SPic. sim). There are also 
Sabellic forms in esum (TE 4 SPic., Ps 4, 5 esum) and sum (Ps 13), but these are in alphabets 
that don't distinguish u and o. 

?? See Weiss 2020: 496-8. On the development of syllabic liquids in Italic, see Zair 2017. 
Thurneysen-Havet's Law (in the formulation of Vine 2006) and the development of *CRHC 
to CaraC are thought to be conditioned by the PIE accent and would be early, potentially of 
Proto-Italic date. I don't believe that there are any secure examples of Thurneysen-Havet's Law 
in Sabellic, but there is one good instance of *CRHC to CaraC (Umb. parfa (type of bird) < 
*parasá < *pr h,seh2, see Höfler 2017). This rule has a close parallel in Greek, however, and it is 
thus conceivable, though unlikely, that the Latin and Sabellic developments were independent. 
Cf. Jasanoff 1991, Meiser 1993, Rasmussen 1996, Christol 2005 for some recent attempts. 
Despite Campanile's attempt (1968: 59) to connect it with the Brittonic subjunctive. 

?5 On the last, see Dupraz 2010: 321. 
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a form of the root *b’uh,- ‘be’ combined with the morpheme *a < *eh, also 
seen appended to the root */;es- in the unique imperfect stem seen in 2sg. 
erä-s ‘you were" "^ The combination of this extended root shape with a nominal 
form of the verb, probably originally an instrumental, is only found in Italic. 
Again, the corresponding forms, if any, in Umbrian and Venetic are unknown. 

Another shared Italic innovation is the replacement of the Proto-Indo- 
European 2nd plural middle ending *-d'ue with a form containing *-m: 
Lat. -mini, Sabellic * -mX inferable from fut.ipv.mid. ending Umb. -mu and 
Osc. -mur « Proto-Sabellic * -mor. 

Ideally, it would be preferable to derive these forms from the same proto- 
form. If, as most scholars believe, the Latin mid.2pl. ending continues the 
nom.pl. of a middle participle (< PIE *-mh ;no-), the Proto-Italic form, if there 
was one, would have been *-manos with the inherited thematic nom.pl. 
ending. In Latin the analogy that produced the mid.ipv.3sg. was act.ipv.2pl. 
-te : fut.ipv.act.3sg. -tod : mid.ipv.2pl. *-manoi : fut.ipv.mid.3sg. *-mandd > 
-minö. That is, the acquirer got the idea that the fut.mid.ipv. was formed by 
hacking off the final syllable ofthe act.ipv.2pl. and substituting -od. But this is 
not the only way an acquirer could have conceived of the relation. 
Alternatively the “rule” could have been “remove all material but the initial 
consonant and substitute with -od," i.e. -t-e : -t-öd :: -*m-anos : *-m-od. This 
path seems the only way to unify the Italic forms and retain the observation 
that both Sabellic and Latin have replaced the inherited mid.2pl. ending with 
a form beginning with *-m.”’ 

There are a few other features shared between Latin and Sabellic, such as the 
use of the interrogative-indefinite stem as a relative pronoun (Umb. po-i — Lat. 
qui). But this is a common development and occurred independently in Hittite, 
Thessalian, and elsewhere. Another oft-cited commonality is the creation of 
a distinct ablative singular form for all declensions, ^? e.g. Osc. toutad ‘com- 
munity’ from an d-stem. However, Celtiberian also created distinct ablatives 
for other stem types, e.g. d-stem arekorataz (the name of a town attested on 
a coin, i.e. ‘from the town of A.")."? Thus this innovation could have happened 


26 There are other views, however, e.g. Willi 2016. 

?7 The alternatives are worse: (1) Latin and Sabellic both replaced the mid.2pl. with m-initial 

forms, which are unrelated. (2) While Latin had a reflex of *-mh,no-, Sabellic had *-mo- like 

East Baltic and Slavic, but there are to my knowledge no isoglosses connecting Sabellic and 

Balto-Slavic, and Balto-Slavic *-mo- may in any case come from *-mh no-. (3) *-mh no- gave 

*-mo- in Sabellic, but this is excluded by the many cases of survival of -mn-, both primary and 

by syncope. 

In the proto-language only o-stems made distinct abl.sg. forms. In all other stem types, the abl. 

sg. and the gen.sg. were identical. Another shared innovation of Sabellic and Latin is the 

extension of the PIE thematic instrumental to the dative-ablative (VOL -ois, Osc. -üís), but 

Venetic has retained the more archaic -obos in louderobos ‘children’ dat.pl. 

?? See Villar 1995 and Beltran & Jordan 2019: 251-3. Young Avestan, independently, did the same 
thing, e.g. zao@raiiat ‘libation’ from an d-stem, etc. 


28 
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(1) independently in Latin, Sabellic, and Celtiberian, (2) in Proto-Italic and 
Celtiberian, or (3) in Proto-Italo-Celtic.”” 

The realm of derivational morphology, which is typically underexploited in 
discussions of subgrouping, also displays a number of striking shared Italic 
innovations. For example, the suffixes *-asiio- (Umb. farariur ‘pertaining to 
grain’ nom.pl.m. = Lat. farrarius), and *-àli- ~ dissimilated to -ari- after a base 
containing an / (Umb. sorsale ‘of pig’ staflare ‘of the stall’ ~ Lat. mortalis 
‘mortal’, militaris ‘miltary’) are exclusively Italic.’ Both Latin and Sabellic 
have specialized the conglomerate *-kelo- to form diminutives to nonthematic 
bases (Osc. zicolom Umb. ticel ‘day, date’ ~ Lat diecula). Both Latin and 
Sabellic have a predominantly deverbal adjectival formant *-d’/i- (Lat. 
amabilis ‘lovable’, Umb. purtifele ‘to be offered"). 

The shared lexical material of Italic is extensive. Safarewicz (1963) esti- 
mated that, with obvious loanwords excluded, 49 percent of the Oscan vocabu- 
lary known to him had exact matches in Latin.” Of course, a true doubter of 
Italic unity could claim that this high percentage results from borrowing. But 
there are several items where semantic divergences make recent borrowing 
unlikely. For example, Lat. aut means ‘or’ but in Osc. avt means ‘but’. Lat. 
enim means 'then', but Osc. inim means ‘and’. We may also note some 
interesting specializations of meaning and/or form not found outside of Italic: 
Latin, Sabellic, and Venetic all have the stem *die- generalized from the 
Stang’s Law outcome of *dieum as the word for ‘day’: Lat. dies, Ven. loc. 
diei, Osc. zicolom, Umb. ticel « *diekelos. Though the accusative form was 
obviously inherited (Ved. dyam, Phryg. Tiav, Gr. Zijv(a)), it is only in the Italic 
languages that a new paradigm has been formed specifically in this meaning. 
Only in Latin and Sabellic does *h>eh,seh; mean ‘altar’ (Lat. ara, Umb. asa 
abl.sg., Osc. aasai loc.sg.). The Hittite cognate hassas means ‘hearth’ and Ved. 
dsa- m. means ‘ashes’. Latin, Sabellic, and Venetic have created a neo-root 
*d'eh,k- ~ d'h;k- from a k-extended stem originally at home in the active 
singular of the aorist (Lat. facere ‘to make’, Umb. facia, Osc. fakiiad pres. 
subj.3sg., Ven. vhagsto pret.3sg., vs. Gr. €Onxe, Phryg. addaket aor.act.3sg. vs. 
£0ero aor.mid.3sg.).? 


3° Closely related to the ablative phenomenon is the extension of originally instrumental adverbs 


in *-eh, > *-e by -d: OLat. FACILUMED ‘very easily’, Osc. amprufid ‘improperly’. 

One might add *-id'o- if we were sure that the Sabellic place name Callifae were to be equated 
with Lat. callidus ‘experienced’ < **hardened' or Lat. calidus ‘warm’. But I am skeptical of this 
because the name is attested only once at Livy 8.25 beside the much better attested Allifae and 
Allifae is known to have had a long i (Allifana Hor. S. 2.8.39, Ital. Alife). 

See also the listing and discussion in Fortson 2017: 843-5. 

There are also many items that are attested exclusively in Latin and Sabellic. To mention just 
two: *kuba- ‘lie’ (Lat. cubat, SPic. qupat) largely replacing */eg’- except in Fal. lecet, SPic. 
veiiat, and Lat. lectum ‘bed’, *fameliia ‘household’ (Lat. familia, Umb. fameria) derived from 
*famelo- ‘slave’ (Lat. famulus). 


3 
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All in all, the phonological, morphological, and lexical innovations shared 
between Latin, Sabellic, and Venetic (when available) are too numerous and 
integrated to be the result of secondary approximation alone. At the same time, 
there are quite appreciable differences between the Italic languages. Given the 
fragmentary state of the Italic languages other than Latin, it is hard to know 
exactly how different Latin and Sabellic were in, for example, the second 
century BCE. In my opinion they were synchronically much less closely related 
or mutually intelligible than the old Germanic languages but more closely 
related than Old Irish and Middle Welsh or Lithuanian and OCS. From this 
we infer that there must have been quite a long period of divergence before the 
forms of Italic began to converge again in the historic period. Whether this 
inference can be made to correlate with any plausible archaeological or genetic 
scenario is an open question. 

Two final points on the question of Proto-Italic: What would prove that the 
Sabellic languages and Latin do not belong to the same subgroup? Well, 
imagine, as a thought experiment, that a scholar claimed that Sabellic and 
Germanic formed a subgroup within Indo-European. This would be refuted to 
most people’s satisfaction by pointing out that Sabellic and Germanic share no 
innovations that (1) are not shared with other groups as well and (2) precede the 
earliest separate innovations in these two groups. If our imaginary proponent of 
Sabello-Germanic retorted that Proto-Sabello-Germanic should be recon- 
structed at a stage that could account for both sets of developments, we 
would respond that such a reconstructed state would be virtually identical to 
Nuclear Proto-Indo-European and therefore any Nuclear Indo-European lan- 
guage would be derivable from it. 

If we return now to reality, can the proponents of the independence of 
Sabellic and Latin point to any innovations in either language group that 
make it impossible to derive the other branch from any other common ancestor 
than the proto-language? There are a few cases where the differing outcomes of 
the two groups lead to the reconstruction of the PIE state of affairs (e.g. syllabic 
nasals, labiovelars), but these must be weighed against the instances where this 
is not the case. Finally, if, on theoretical grounds, we believe that only binary 
branching is possible, "^ denying an Italic subgrouping immediately raises the 
question of what Sabellic or Latin should be grouped with instead. And when 
we put the question in those terms, it becomes clear that there 1s no other branch 
that could be more closely grouped with either Latin or Sabellic. And thus one 
would be forced to the position that, despite the evident shared innovations, 
Latin or Sabellic is more closely related to some language of which there is no 
trace. 


34 See Hale 2007: 238-9 on this point. 
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8.3 The Internal Structure of Italic 


There are actually fairly few innovations on the Latin side that can be shown to 
encompass all the Latin dialects and to have taken place before the onset of the 
historical record. In many cases the fragmentary dialects don't preserve the trait 
in question. For example, the productive v-perfect formant is attested in Roman 
Latin (earliest epigraphical example PROBAVET 'approved' from the Egadi 
rostra dated before 241 BCE; see Prag 2014) and in Praenestine (CAILAVIT 
*chiseled") but not in Faliscan. This is presumably an accidental gap. The scant 
corpus of Faliscan does not preserve any alternative morpheme in the same 
functional slot. In some cases what would appear to be a defining innovation of 
Proto-Latin can be seen not to have affected all the dialects or to have been 
a later diffused change. Roman Latin, for example, has changed medial */ and 
*d to stops b and d, and *y to A, whereas Faliscan has f for the reflex of medial 
* and *d and apparently a stop g as the reflex of *y (Fal. lecet ‘lies’ < */eg^eti). 
Thus Proto-Latin must have had fricatives *ß, *d, and *y, and not *b, *d, and 
*h as Roman Latin. With these cautions in mind, we may point to the following 
Latin innovations, which are assumed to be Proto-Latin in the absence of 
evidence to the contrary. 

As far as phonology is concerned, few secure innovations define the Latin 
node; most prehistoric phonological innovations are on the Sabellic side. One 
Latin innovative feature is the shortening of long vowels before final -m, which 
did not happen in Proto-Italic since South Picene and Oscan preserve distinct 
reflexes ofa long vowel in the genitive plural in *-oóm and Oscan retained a long 
vowel in this environment at least in monosyllables (Weiss 1998; Zair 2016: 
82). There was eventually such a shortening in polysyllables in the Sabellic 
languages, but this is an independent change. Another phonological innovation 
on the Latin side is the contraction of some unlike vowel sequences after the 
loss of intervocalic yod. Whereas like vowels contracted already in Proto-Italic 
(e.g. the iterative-causative suffix *-eie- > *-ee- > *-e- Lat. e, PSab. *e), the 
sequence *aid contracts in Latin to -o but remains uncontracted in Sabellic 
(Lat. voco ‘I call’ vs. Umb. subocau ‘I invoke’ < *subuokaio). Likewise, the 
sequence *-àie- contracts to e in Latin but remains uncontracted or perhaps 
diphthongizes in Sabellic (ames ‘love’ subj.2sg. < *amdiés, vs. Osc. deivaid 
‘may he swear’ < *deiudiéd).*° 

There are other cases where both Latin and Sabellic have innovated in 
different ways from the Proto-Italic situation. For example, Latin has elimin- 
ated the nasal in the sequence *-Vns#, e.g. in the acc.pl. (VOL DEIvos ‘gods’ 
acc.pl.). The same treatment is found in Venetic (Ven. deivos ‘gods’ acc.pl.). In 
Sabellic, on the other hand, the development of final *-ns and *-nts is to -f 


35 Cf. also the uncontracted form Umb. ahesnis ‘brazen’ abl.pl. < *aiesno-, although in this case 
the Latin cognate aenus also remains — surprisingly — uncontracted. 
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(Umb. vitluf ‘calves’ acc.pl., traf ‘across’ ~ Lat. trans < *trants). The sequence 
that gave -nd- in the Latin gerund(ive) gave -nn- in the corresponding Sabellic 
morpheme (Osc. üpsannam ‘to be constructed’; Umb. pihaner /pianner/ ‘to be 
purified” = Lat. piandi). 

Sabellic, on the other hand, has a number of distinctive phonological innov- 
ations. The most salient is the change of the voiced and voiceless labiovelars to 
labial stops, e.g. *k"is ‘who’ > Osc. pis, Umb. pis-i = Lat. quis; *g"ihsuo- 
‘alive’ > Osc. bivus ~ Lat. vivi, *g"em- ‘come’ > Umb. benust fut.perf.3sg. ~ 
Lat. vénerit.*° Venetic agrees with Latin in preserving the voiceless labiovelar 
and turning *g” into w (kve ‘and’, vivoi ‘alive’ dat.sg.). Another distinctive 
feature of Sabellic is across-the-board syncope of a short vowel before a final 
-s (Osc. hurz ‘garden’ = Lat. hortus, Osc. bantins ‘from Bantia’ < *bantinos). 
In Latin and Venetic this type of syncope is more limited and occurs chiefly 
after r (Lat. sacer ‘sacred’ < *sakros,°’ Ven. teuters ‘public’ < *teuteros), but 
not elsewhere. In Sabellic, stops were lenited to fricatives before a dental stop, 
so *pt > ft, *kt > ht (Osc. scriftas ‘written’ nom.pl.f. ~ Lat. scriptae, ühtavis 
‘Octavius’). In Umbrian *ft > ht (screihtor ‘written’ nom.pl.n.). The voiced 
labial and dental fricatives -f- and -d-, which occurred in medial position as the 
reflexes of the PIE voiced aspirates, merged as 2 (f) (Osc. mefiai ‘in media’, 
Umb. rufru ‘rubrös’). In Latin and Venetic these are kept separate, ultimately 
becoming stops in Latin and probably eventually in Venetic too (mediai 
‘middle’ loc.sg. from *med"io/a-, louderobos ‘children’ dat.pl. from *h ;leud'eros, 
cf. Gr. &eó0epoc “free’). In initial syllables, syllabic nasals developed to *aM in 
Sabellic but to *eM in Latin (Osc. fangvam ‘tongue’ acc.sg. < *d'ng Pug. an- 
neg. < *n- vs. OLat. dingua < *dng'uä, in- < en-). But elsewhere the development 
is to en as in Latin (Umb. desen-duf ‘twelve’, cf. Lat. decem ‘ten’). In Venetic the 
outcome is -an at least in final syllables (donasan ‘they gave’ < *-snt). 

Turning to morphology, the innovations are more evenly distributed between 
Latin and Sabellic.** Oscan and very probably Umbrian have remade the 
inherited nom.sg. of n-stem nouns with -o as the final-syllable vowel by 
introducing the n from the oblique stem and recharacterizing the nominative 
with -s. The resulting sequence gave -f, e.g. Osc. üíttiuf ‘use’ < *oition-s vs. 
Lat. -io, -ionis. In the -eh;-stems, Sabellic retains a contrast between a reflex of 
*-à < *-eh» in the nom.sg. (Osc. viu, Umb. Turso [name of a goddess]) and *-a 
in the vocative (Umb. Tursa) whereas Latin has a surprising and not satisfac- 
torily explained -a in both the nom. and voc.sg. In Sabellic, the proterokinetic 


?* The Sabellic treatment of the voiced aspirate labiovelar was to fin medial position: *h 1ueg"t- > 
Umb. vufru ‘votive’ ~ Lat. voveo. There are no good examples of initial *g"^, but it would be 
surprising if it was anything other than f. 

37 The nom.sg. SAKROS is attested in VOL but it is probably an analogical restoration. 

?* For a survey of Italic morphology with many references, see Vine 2017. 
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i-stem gen.sg. ending *-eis was generalized to the o-stems and consonant stems 
(medíkeís ‘meddix’ gen.sg.). In VOL o-stems retain -osio, as in VALESIOSIO 
‘of Valesios’ (Hom. Gr. -oro, Ved. -asya, etc.) beside -7, which eventually replaces 
-osio, and consonant stems retain -es and -os. But note that in the case of *-eis, 
Sabellic preserves an ending eliminated by Latin. Sabellic also gets rid of the 
athematic accusative singular ending *-em < *-m, replacing it with thematic *-om. 
But, on the other hand, Sabellic retains the athematic nom.pl. *-es, which is 
syncopated to -s (Marruc. medix ‘medix’ nom.pl. « *medikes), whereas Latin has 
eliminated this ending in favor of -ës < *-eies originally from the -i-stems. Sabellic 
also retains the thematic nom.pl. *-ös and the a-stem nom.pl. *-as, which Latin has 
replaced with pronominal *-oi > -i and analogical -ai > -ae, just as in many other 
Indo-European languages. Sabellic has even extended the thematic nom .pl. 
nominal ending to the nonpersonal pronouns (Osc. püs *who' nom.pl.m. vs. Lat. 
qui). The neut.pl. in Sabellic has generalized the thematic ending -@ < *-e/ to 
athematic forms (Umb. triiuper ‘three times’), but in Latin the generalization has 
gone the other way with -a < *-h; in all paradigms. 

In pronominal morphology, Latin has extended the accusatives of the 
singular personal pronouns by -(V)d (VOL, Fal., Praen. med) whereas 
Sabellic has used the particle *-om (OUmb. míom). Sabellic retains the 
oblique stem formant -sm- in the anaphoric and relative-interrogative stem 
(SPic. esmín loc.sg., Umb. esmik dat.sg., pusme ‘to whom"), which Latin has 
replaced (isti eiiei, cui, quo). Sabellic has an innovative oblique stem of the 
anaphoric pronoun *eis- created by reanalysis of the genitive plural *eisom, 
e.g. dat.pl. Osc. eizois, Umb. erer-unt. Sabellic has a unique proximal deictic 
stem *eko-/ekso-. In Oscan these stems are suppletive, with *eko- forming the 
nom.-acc. and *ekso- forming the oblique stem. Umb. has a unitary stem 
*esso- < *ekso-, which may be the older situation. The corresponding Latin 
proximal deictic is made from a stem *ho- (Lat. hic, Fal. hec ‘here’), which 
may be continued in the Umb. pronominal form erihont “the same’ nom. 
sg.m. ^? 

In the personal ending of the verb Latin has generalized the thematic 3rd person 
plural *-ont(i) to athematic forms (sont ‘they are’; exception: opt.3pl. sient) 
whereas Sabellic has extended the range of the ending -ent (Osc. fiie(n)t), though 
-ont does survive in Pre-Samnite fofroö ‘they were’. In the primary 2nd plural SPic. 
has -tas (videtas ‘you see’), which must be from *-/as since *-tas would have 
syncopated to *-/s. But Latin has an incompatible -tis. Most probably, Proto-Italic 
had primary 2du. *-tas, 2pl. *-tes;*' secondary 2du. *-tä, 2pl. *-te. Latin would 


?? Latin does have instances of nom.pl. -ds, but these are probably not archaisms. See Weiss 
2020: 252. 

^9 On the Sabellic demonstrative system, see Dupraz 2012. 

^! There is little evidence for 2pl. primary *-tes outside of Italic. The ending *-tes may itself have 
been an analogical creation on the model of primary *-me/os. 
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have generalized the plural endings while Sabellic generalized the dual endings and 
leveled the *a from the 2du. secondary to the primary dual ending. In 3rd person 
middle endings, Umbrian appears to have preserved the PIt. situation with primary 
-ter and -nter < *-tro and *-ntro contrasting with secondary endings in *-tor, -ntor 
(Umb. terkantur ‘let them see"). Oscan has generalized the primary endings, and 
Latin has generalized the secondary endings. Sabellic also retains t-less mid.3sg. 
forms (Umb. ier ‘one goes’, ferar ‘one should carry’, Osc. loufir ‘or’, lit. ‘(if) it is 
wished’), which Latin has eliminated without a trace. The mid.2pl. is not attested 
in Sabellic but is partially inferable on the basis of the deponent future ipv. ending 
*-mo (Umb. persnimu ‘pray’, Osc. censamur) which must have been created 
like the corresponding Lat. -minö on the basis of the 2pl. middle ending. This 
form therefore began with *m-. In the endings of the perfect system, Latin has for 
the most part preferred the endings originating in the PIE perfect (1sg. -ai, 
2sg. *-istai, 3sg. -eit, 3pl. -ére, all of perfect origin), but has also incorporated 
some originally aorist endings (3sg. -ed, 3pl. -ér-ont, and -(er)ont — *-ond, 
cf. Fal. fifiqo(n)d). Sabellic has only aoristic endings in the forms we know: 
Isg. -om, 3pl. -ens, Pre-Samn. -o(v)ó. The ending -e of the perf.2pl. form Pael. 
lexe ‘you (pl.) have read’ is sometimes compared to Ved. perf.2pl. -a but, 
given the overall aoristic provenance ofthe perfect endings of Sabellic, this is 
improbable. The ending may ultimately be from *(-s)te. 

When we turn to tense, aspect, and mood, Latin has cobbled together 
a future tense out of (1) an original periphrastic with *b’uh,-, which gives 
the b/f-future (Fal. carefo ‘I will lack’, Lat. carebo ‘I will lack’), and (2) the 
PIE subjunctive (athematic ero ‘I will be’, thematic düces ‘you will lead’). 
Sabellic has a probable trace of the thematic subjunctive formant e in SPic. 
knüskem ‘know’ lsg., but it is difficult to say whether this is used in 
subjunctive or future function. Otherwise, Sabellic has generalized the athe- 
matic s-future to all stem types. Latin once had comparable forms, but only 
the PIE subjunctive and optative of this type survive in the mainly OLat. faxo 
(fut.), faxim (subj.) type. In addition, the Latin future perfect and perfect 
subjunctive (fecero, fecerim) employ this same morpheme after a union 
vowel /i/ appended to the perfect stem. Sabellic forms the future perfect 
with the same athematic s-morpheme added to a union vowel -u-,? but the 
perfect subjunctive does not use the s-morpheme at all, instead adding -e- to 
the perfect stem (Osc. tribaraka-tt-i-ns). This e is presumably the same as 
the subjunctive formant e used in the present stem. 

The stem formation of the perfect is very divergent between Latin and 
Sabellic.** While both Latin and Sabellic continue reduplicated perfects, 


42 See on this category most recently Zair 2014. 

^5 This disagreement contrasts strikingly with the general agreement of present stem formation 
between Latin and Sabellic. The reasons for this contrast are presumably (1) that the historical 
"perfect" of Latin and Sabellic results from the parallel independent merger of the PIE perfect 
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simple perfects (perhaps thematic aorists or dereduplicated perfects), and 
lengthened-grade perfects, Latin has utilized the s-aorist formant much more 
than Sabellic, which does not have a single completely certain example. 
Venetic, on the other hand, seems to have generalized the s-aorist as the 
productive perfect formant (donasto ‘gave’ 3sg., donasan 3pl., vhagsto 
‘made’ 3sg.). Latin has greatly expanded the range of the v-perfect, of which 
Sabellic has no trace. Sabellic has a number of innovative perfect formants 
mostly of quite unsettled origin. These are the -/t-perfect and the supposed 
-k-perfect of Oscan, the -nki- perfect of Umbrian, and the -o-perfect of South 
Picene.** 

In nonfinite morphology, Latin and Sabellic have made different choices in 
grammaticalizing different case forms of different stem types as an infinitive. 
Latin makes use of the locative of an s-stem (dücere ‘to lead’ < *deukesi) 
whereas Sabellic used the accusative in -om (Osc. tríbarakavüm ‘to build"), 
which might originate in either a thematic or a root noun (since Sabellic 
replaces *-em with -om). For the medio-passive, Sabellic retains the 
form *-fié or *-fiei (Umb. cehefi ‘to be taken’, Osc. sakrafir ‘to be sanctified’), 
which is an instrumental or dative of the same piece of nominal morphology that 
gave Ved. -dhyai and Toch.B, Toch.A -tsi. Latin has perhaps redone the expected 
cognate *-dié as -rier to create its passive infinitive (see Fortson 2012; 2013). 

Nominal derivational morphology is overall quite similar between Latin and 
Sabellic. Most suffixes found in one branch are also found in the other in more 
or less the same function. One notable difference is that Sabellic has no 
hesitation in adding the suffix -iio- to a base in -iio-. This is the origin of the 
Sabellic (mainly attested in Oscan) gentilics in -iis -jes (statiis < *statii-ii-os 
derived from the praenomen staatis < *s/atii-os). Latin has no trace of such 
iterative derivation and prefers formations like Lücilius and Manilius from 
Lücius and Manius or Latinus from Latium. An interesting mismatch between 
Latin and Sabellic is shown by Lat. fanum ‘shrine’ < *fasnom vs. Osc. fíísnu 
nom.sg.f., where Latin continues a zero-grade of the root *d"h,s- and 
Sabellic reflects a full-grade *d"eh,s-. Since ablaut in a -no- or -na-stem 
is unlikely, it is possible that the derivational base showed ablaut or that 
one or the other forms may have been remade on the basis of related 
elements. 

Defining the distinctive lexicon of Latin vs. Sabellic is challenging given the 
incommensurate sizes and natures of the corpora. For example, even if we 
combine all Sabellic languages, we still know fewer than eighty of the 200 


and aorist and (2) that denominal verbs had not yet acquired a productive perfect formant. For 
the development of the perfect system of Italic, see in general Meiser 2003. 

44 On these formants, see for the most recent proposals and a review of earlier scholarship: Willi 
2010; Dupraz 2016: 340 (-nki-); Willi 2016; Dupraz 2016: 347 (-tt-); Dupraz 2018 (-k-); Zair 
2014 (-ö-). 
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items on the Swadesh list. It is often impossible to determine whether Latin and 
Sabellic agree on any particular lexical innovation. Nevertheless, something 
can be said about the distinctive lexical profile of Sabellic (see Buck 1928: 12— 
17). There are a small number of cases where Sabellic retains a form ofa root or 
particle that has been completely eliminated from Latin.^? Sabellic, like Vedic, 
has reflexes of both *uih;ro- (Umb. ueiro acc.pl.n.) and *h>ner (SPic. níír, 
Umb. nerf acc.pl. etc.) in the meaning ‘man’ while Latin has eliminated the 
latter.“ The Indo-European word for ‘daughter’ is preserved in Osc. futir 
while Latin has replaced it with the neologism filia. Likewise, an old word 
*putlo- *son' survives in Osc. puklum acc.sg., cf. Ved. putrá- in contrast to Lat. 
filius. Umbrian preserves the r/n-stem for ‘water’ (utur acc.sg., une abl.sg.) 
while Latin only continues the derivative unda ‘wave’'’ and has replaced the 
basic word with a West IE neologism *ak"à > Lat. aqua.^" Sabellic has a word 
for ‘god’ *aiso- (Osc. aisás nom.pl.) that it shares with Venetic (aisus), to the 
exclusion of Latin. Interesting are the divergent prepositional/preverb forms: 
Osc. aa-, Umb. aha- ‘to’ (OHG uo-) with no analog in Latin; Osc. eh, Umb. ehe 
‘from’ < *eg^, which, like Lith. iz, preserves an s-less form of this particle 
whereas Latin has only ex and its further developments; and *däad ‘from’ 
(Osc. dat, Umb. da-), which has no analog outside of Sabellic at all. Some 
other Indo-European roots and words are preserved uniquely in the Sabellic 
branch: *ad- ‘law’ (Umb. arsmo ‘rites’, Olr. ad ‘law’, ada fitting"), Umb. 
e-iscurent ‘seek’ fut.perf.3pl. (Ved. iccháti),"? Osc. cadeis ‘hostility’ (OHG haz 
‘hate’), Osc. mais ‘more’ (Goth. mais *more"), *nertero- ‘left? (Umb. nertru, 
Osc. nertrak ~ ON norpr ‘north’), Osc. nessimas nom.pl.f. ‘nearest’ 
(Olr. nessam ‘nearest’), Umb. pir ‘fire’ (Gr. zip, but possibly in Lat. pürgäre 
‘to purify’), Umb. terkantur ‘look’ subj.3pl. (Gr. dépxoyai), Osc. touta 
‘people’ (Olr. tuath, Goth. piuda, etc.). There appears to be no particular 
pattern to these items. Some have matches only in the Northern European IE 
languages (*ado-, *touta). Most have widespread cognates. 

The syntax of Sabellic and Latin are very similar, but this may be partly the 
result of the generic similarities of epigraphical documents from Central Italy. 
The use of the locative case in Latin has been greatly curtailed, but the Sabellic 


45 
46 


Iam intentionally omitting any proposal of my own. 

The gens Claudia, of legendary Sabine origin, introduced the praenomen and cognomen Nerö 
‘strong’ into Latin. 

Unless unda is deverbative to *u-ne-d-ti (Ved. unatti ‘flows’). 

Oscan aapam acc.sg.fem. is usually compared with Ved. äp- ‘water’, but this root is */iep- and 
would not have a long d in its paradigm. The Oscan word does not mean ‘water’ per se but most 
likely ‘water works’ vel sim. and can be explained as the substantivization of an inner-Italic 
vrddhi formation *apo- ‘of water’, but this could just as well be derived from *apa < *ak"a as 
from *ap- < *hzep-. 

Unless the root *h;eis- is somehow continued in Lat. quaerere ‘to seek’ and aeruscäre ‘to beg’. 


47 
48 


49 
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languages continue to use it freely: Osc. eisei terei ‘in this territory’, comenei 
‘in the assembly’. 


8.4 The Relationship of Italic to the Other Branches 


Italic, unsurprisingly, shares many features with other European branches of 
Indo-European. Meillet (1922) famously recognized a “civilization of the 
Northwest”, which was shared by the branches that eventually became Balto- 
Slavic, Germanic, Italic, and Celtic.? 0 

One occasionally sees the claim that Italic has an especially close relation- 
ship with Germanic (most recently Kuz’menko 2011), but there are in my view 
no innovations in phonology or inflectional morphology shared exclusively by 
Italic and Germanic.?' Most of the innovations listed by proponents of Italo- 
Germanic such as Hirt 1897 or Devoto 1936 are not correct, are attested 
elsewhere, or are suspect of being parallel developments.”” 

That said, there are still a number of unique Italo-Germanic agreements in 
lexicon. Only in Germanic and Italic is the suffix *-no- added to the multiplica- 
tive adverb *duis ‘twice’ to create *duisnó- ‘double’, a proto-form continued in 
the Latin distributive numeral bini ‘two at a time’ and in Germanic *tuiznd- 
(ON tvennr ‘two-fold’, OE ge-twinn ‘twin’, OHG zwirnon ‘to twine’). But the 
addition of the suffix *-no- to adverbial forms is well known elsewhere in Indo- 
European, cf. Ved. puränd- ‘ancient’ — pură ‘of old’. Lat. vadum ~ ON vad 
n. ‘ford’, OHG wat, OE weed ‘sea’ is a perfect and exclusive match. Assuming 
Germanic *ga- really is cognate with Latin com-, the match between Lat. 
commünis and Goth. gamains etc. is striking. An innovative word for ‘year’, 
Lat. annus, Osc. akenei loc. sg., and Goth. apna-, is added to PIE *hjieh;r, 
which survived in both Italic (hornus “this year’ < *ho-iVrno-) and Germanic 
(Goth. jer), but this word is probably closely related to the Celtic words for 
‘time’ Olr. am, Gaul. amman (see Stifter 2017: 220-2). The words for ‘barley’, 
OHG gersta and Lat. hordeum, reflect two different genitival derivatives of 
a *g'rsdo- ‘prickly plant’ (OE gorst). Italic and Germanic share not one but two 
exclusive words for ‘be silent’: (1) Lat. tacere, Umb. tagez, Goth. bahan, ON 


50 For more on the Indo-European of the Northwest see Chapter 7. 

5! Polomé 1966 refutes the supposed morphological and semantic isoglosses well, though obvi- 
ously we might conduct the refutation somewhat differently today. 

The supposed match between the -ne of Lat. superne and the -na of Goth. utana ‘from the 
outside’ is vitiated by the fact that Latin superne has a short final -e (cf. also dönicum ‘until’ < 
*donVk"om), and the suffix -nV added to preverbs is found widely in Indo-European, e.g. Hitt. 
istarna ‘among’. The derivation of adverbs from preverbs and pronominal bases with *-t(e)rö or 
*-t(e)rod as in Lat. ultro ‘willingly’, Osc. contrud ‘contrary to’, and Goth. innapro ‘from 
within’, etc. is not exclusive (cf. Gr. zporépo ‘further’ etc.), and the semantic match between 
Italic and Gothic is not good. While the Gothic forms have ablatival meaning, the same is not 
true of the Italic forms. 


52 
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þegja, OHG dagén and (2) Lat. silere, Goth. anasilan ‘to grow calm’. Both 
verbs have been compared more widely (LIV? 495 explains tacere - bahan as 
a semantic specialization of *pteh>k- ‘to cower’; others prefer to connect these 
forms with Olr. tachtaid, MW tagu ‘strangle’ < ‘silence’, and silere has been 
compared to Olr. silim ‘pour’ and more generally to the root *se/ ;-i- ‘let’), but 
even if these somewhat difficult comparisons are correct, the close formal and 
semantic match between Germanic and Italic remains. 

A word needs to be said about two matches between Venetic (considered 
here as an Italic language) and Germanic. The Venetic accusative of ego ‘I’ is 
mego, and this has been compared to the Germanic accusative reflected by 
Goth. mik. Given the different extensions of the accusative stem of the 1sg. 
personal pronoun seen in Latin (OLat. med), Sabellic (OUmb. miom), and 
Venetic, it seems that Proto-Italic must have actually retained an unextended 
accusative *mé or *me.° This would mean that mego would have resulted 
from a secondary influence of Germanic, but given the fact that mego can be 
explained as an inner-Venetic conflation of *egö and *me, it's preferable to 
leave the Germanic and Venetic forms unconnected.** The second Venetic- 
Germanic isogloss, Venetic sselboisselboi ‘for himself? ~ Goth. silba ‘self’, 
OE seolf, OHG selbselbo, ON sjálfr is unquestionable and not likely to be 
coincidental. But it should be noted that the Venetic form occurs on one of the 
latest Venetic inscriptions, written in the Roman alphabet. It's possible that 
this is the result of a quite recent and perhaps one-off borrowing from 
Germanic. 

Although there have been the occasional attempts to connect Italic more 
closely with one or another branch of Indo-European, these are mainly of 
historical interest. In the early days of Indo-European, some scholars 
posited a Greco-Italic group, e.g. Georg Curtius (1858: 22).°° In both 
Greek and Italic the voiced aspirates end up as voiceless segments at 
least in word-initial position, but the behavior of medial voiced aspirates 
is quite different.°° Greek and Italic share an innovative nominal genitive 
plural for *eh>-stems *-asom (Lat. -arum, Osc. -azum, Hom. Aeol. Gr. 
-Gov), but this is an introduction of the pronominal form (cf. Ved. tasam 
*of these") into the nominal paradigm, which could have happened inde- 
pendently. An interesting agreement in derivational morphology is the 
extension of the stem *deks(i)- ‘on the right’ by the oppositional suffix 
*-teros (Lat. dexter, Umb. destram, Gr. óecivepóc) in contrast to *deks(i)-uos 
(also in Gr. óe&(p)óc, Gaul. dexiva, Goth. taihswo ‘the right hand’) or 
*deks(i)-no- (Ved. daksina-, YAv. dasina-, Lith. désinas, OCS desnv). The 


55 Pye suggested (Weiss 2018b: 351) that exactly such a form, me, is continued in Venetic me in the 
Isola Vicentina inscription. 

4 Cf. also Hitt. ammuk. °° The most recent proponent of this view was Thibau (1964). 

56 The voiced aspirates were also devoiced in Romani. 
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antonym oxau(F)óg ~ scaevus ‘left’ is also limited to Greek and Latin. An 
exclusive match of derivation and meaning in a culturally important word is 
Lat. liber ‘free’, pl. liberi ‘children’, Ven. louderobos ‘children’ dat.pl., Gr. 
éevOepocg ‘free’. The use of the suffix *-ero- suggests that *h,leud'eros 
originally meant ‘(one who is) in the people’ (*h,leud'o- > ON ljóór 
‘people’, OE /éod ‘man’, OCz. lud ‘people’) in opposition to those outside 
of the community. 

Martynov (1978) has indicated a number of cases where Slavic has two 
words for the same thing, one of which closely matches Italic. Most of these 
comparisons don't hold up — the two best are OCS mesece ~ OCS luna ‘moon’, 
cf. Lat. ina and OCS star» ~ OCS mator» ‘old’, cf. Lat. matürus — but these 
are not significant enough to support any theory of closer connection between 
Italic and Slavic, let alone Martynov’s view of a prehistoric conquest of pre- 
Slavs by pre-Italic speakers. 

Finally, Melchert (2016), following in the footsteps of Puhvel (1994), has 
pointed out a few instances where Latin shares some innovative features with 
Anatolian, e.g. (1) HLuv. REL-ipa ‘indeed’~ Lat. quippe ‘indeed’ < *kid-pe, 
(2) Lyd. nav, nä-m qid ‘whatever’ ~ Lat. quidnam ‘what on earth?’, (3) Hitt. 
kappüwe/a- ‘count’ ~ Lat. computäre ‘to reckon’. Whether these agreements 
are sufficient to support some secondary contact between Proto-Italic and 
Proto-Anatolian is uncertain. 

To sum up: the similarities noticed between Italic and other Indo-European 
branches are predominantly lexical, and, when we compare these similarities to 
the ones noted between Italic and Celtic, the case for Proto-Italo-Celtic seems 
all the stronger (Chapter 7). 


8.5 The Position of Italic 
See Chapter 7. 
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9 Celtic 


Anders Richardt Jergensen 


9.1 Introduction 


This chapter provides an outline of the defining characteristics of the Celtic 
proto-language and the internal divisions within Celtic. Only languages which 
are clearly identifiable as Celtic will be included in this treatment, 1.e. Goidelic, 
Brittonic, Gaulish (including Cisalpine, Transalpine and the onomastic mater- 
ial from Central European and Balkanic Celtic), Celtiberian, Lepontic and 
Galatian. Pictish, Tartessian and Lusitanian will be excluded, either due to 
the fragmentary attestation or because it is highly unlikely that the language 
belongs to the Celtic branch of Indo-European. 


9.2 Evidence for the Celtic Branch 


When listing the defining innovations of Proto-Celtic, we quickly encounter 
a problem closely linked to the poor attestation of the Continental Celtic 
languages: many of the most distinct innovatory features differentiating 
Celtic from the other Indo-European branches can strictly speaking only be 
proven to be “Proto-Goidelo-Brittonic”, and it is unclear how close this actu- 
ally takes us to a Proto-Celtic encompassing both the Insular and the 
Continental Celtic branches. ! However, an area where the scant attestation of 
Continental Celtic nonetheless provides a fair amount of information is histor- 
ical phonology. Accordingly, Proto-Celtic will mainly be defined by a series of 
phonological changes differentiating it from Proto-Indo-European and the 
other Indo-European branches. This does not mean that Proto-Celtic had not 
innovated in other areas such as morphology and syntax, only that our limited 
knowledge of Continental Celtic, particularly in the area of verbal morphology, 
makes it difficult to project innovations such as the ¢-preterite, the s-preterite 


Work on this chapter was carried out with the support of the Independent Research Fund Denmark 

for the project Connecting the dots: Reconfiguring the Indo-European family tree, and of 

Riksbankens jubileumsfond for the project Languages and myths of prehistory: Unravelling the 

speech and beliefs of the unwritten past (LAMP). 

! Insular and Continental Celtic are used here as geographical terms, without necessarily signifying 
linguistic subgroups. 
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and the d-preterite back to a specific stage beyond “Insular Celtic" (or Goidelo- 
Gallo-Brittonic for that matter) For instance, it is not absolutely certain 
whether the merger ofthe PIE aorist and the perfect into a new preterite, though 
completely carried through in Insular Celtic, had necessarily occurred by the 
Proto-Celtic stage. 

In the following, some ofthe more significant innovations from PIE to Proto- 
Celtic will be listed in rough chronological order, to the extent that such an 
order can be established. For more detailed descriptions, see e.g. McCone 
1996: 37—104 and Stifter 2017. 


9.2.1 | The Centum Merger and "Thorn" Clusters 


The centum merger, i.e. the merger of palatal and plain velars, is unconditioned 
in Celtic and can therefore not be placed with confidence in the relative 
chronology. Given the equally unconditioned developments in several other 
Indo-European branches (most notably the neighbouring Germanic and Italic 
branches), it is likely that this is a very early areal innovation. 

Proto-Indo-European sequences of original palatal stop  *u merge with the 
corresponding labiovelar in Celtic: *h jékuo- ‘horse’ > PCelt. *ek"o- (cf. the 
Gaul. theonym Epona, Olr. ech ‘horse’, MW ebawl ‘foal’) has the same medial 
phoneme as PIE *tek"- ‘runs’ > PCelt. *tek"-e/o- (Olt. techid ‘flees’, MW tebed 
‘retreat, flight"). 

Proto-Indo-European “thorn” clusters, traditionally reconstructed as PIE *Kp/ 
Gð but in fact rather PIE *7K, underwent metathesis to *KT in pre-Proto-Celtic, 
as exemplified by *hrtko- > *hrkto- > *arxto- > PCelt. *arto- (W arth ‘bear’, 
Olr. art ‘hero’) and *d'e^om- ‘earth’ > *gdom-io- > PCelt. *edon-io- ‘earthly; 
mortal’ (Cisalp. Gaul. TEUO-XTONI[O]N ‘of gods and mortals’; simplified to 
*don-io- in later Celtic, e.g. Olr. duine, MW dyn, MBret. den *man'). 


9.2.2 The Syllabic Liquids 


The Proto-Indo-European syllabic liquids developed a prop vowel whose 
distribution is mainly conditioned by the following segment. The commonly 
accepted distribution assumes the outcome *ri/li before stops and *m and 
the outcome *ar/al elsewhere: PIE *b'rg^- > PCelt. *brig- (Gaul. -briga, 
Olr. bri, MW bre, MBret. bre ‘hill’), *k"rmi- > PCelt. *k"rimi- (Olr. cruim, 
MW pryf, MBret. preff ‘worm’), *kr-n- > PCelt. *karnV- (Galat. xápvov 
‘horn, trumpet’, MW carn ‘horn, hoof’, ModBret. karn ‘hoof’), *krso- > 
*karso- > PCelt. *karro- (Olr. carr, MW kar, MBret. carr ‘cart’), *prh2-i > 
PCelt. *(d)are (Gaul. are-, Olr. air-, MW ar-). This distribution has recently 
been challenged by Hill (2012), who assumes that *;// also gave *ri/li 
before *n. This would provide a straightforward explanation for a form 
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such as Olr. tlenaid ‘steals’ < PCelt. *tli-na- < PIE *t/[-n-ah;- from the root 
*telh>- (LIV? 622), which otherwise is difficult to explain satisfactorily. The 
apparent counterexample PCelt. *karnV- may be derived from PIE *kr-sn V- 
instead. 

The differing treatment of PIE *h artko- ‘bear’ and *h rg”- ‘mount, go up’ in 
Celtic, PCelt. *arto- (MW arth ‘bear’, Olr. art ‘hero’) and *rig- (Olr. fut. -rega 
“will go’, cf. McCone 1996: 62) respectively indicates that, in initial position at 
least, a preceding */;; caused the prop vowel to develop before the syllabic 
liquid and not after it, as would be otherwise expected. This means that */, still 
contrasted with *h, when the prop vowels developed. 


9.2.3 Elimination of the Laryngeals 


As is usually the case in non-Anatolian branches, the PIE laryngeals were 
eliminated as phonemes, but left traces in various ways. Word-initial laryngeals 
were lost without a trace, whether prevocalic or preconsonantal, while post- 
vocalic laryngeals in the syllabic coda were lost with compensatory lengthen- 
ing of the preceding vowel. The latter development took place before the 
restructuring of the long vowel system outlined below. In a fair number of 
instances, however, the expected lengthening does not appear, and we are 
instead left with a short vowel, e.g. PIE *uiH-ró- > PCelt. *uiro- ‘man’ (Olr. 
fer, MW gwr), PIE *g"uH-tu- > PCelt. *eutu- (Olr. guth ‘voice’). This phe- 
nomenon, called Dybo's Shortening (after its first formulation in Dybo 1961), 
is not restricted to Celtic but is also found in Germanic and Italic, possibly as 
part of an early areal tendency. The exact conditions leading to this shortening 
(or lack of lengthening) are not clear, and no consensus has formed as yet. For 
a recent discussion of the literature on this problem and the proposed solutions, 
see Zair 2012: 132—50. 

Laryngeals between non-syllabic consonants are usually vocalized to *a, as 
in e.g. PIE *photer- > PCelt. *(d)ater- (Olr. athair ‘father’), PIE *sh;-tV- > 
PCelt. *satV- (MW had, MBret. hat ‘seeds’), PIE *plth>-no- > PCelt. *($)litano- 
(Olr. lethan, MW llydan, MBret. ledan ‘broad, wide’). This appears to be the 
case irrespective of the position of the syllable in the word, agreeing with Italic 
but differing from Germanic and Balto-Slavic, where only laryngeals in the first 
syllable appear to be vocalized to *a. 

Sequences of CRHC usually develop into CRäC as in Italic, e.g. PIE 
*plh;-mah; > PCelt. *($)lama (Olr. lam, MW llaw *hand"), PIE *mlh2tV- > PCelt. 
*mlatV- (Olr. mlaith ‘smooth’, MW blawd ‘flour’, MBret. bleut), but occasionally, 
a short vowel is encountered instead, e.g. *prH-ti- > PCelt. *(¢)rati- (Gaul. ratis, 
Mlr. raith, MBret. raden ‘ferns’; cf. Schumacher 2004: 136-7). For recent 
treatments of the problems relating to the development of laryngeals in Celtic, 
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cf. McCone 1996: 51-4; Schumacher 2004: 135-8; Zair 2012; Stifter 2017: 
1194—6. 


9.2.4 The Syllabic Nasals 


The development of the syllabic nasals is straightforward. As has been demon- 
strated convincingly by McCone (1992: 21—6; 1996: 70—9; for the traditional 
view, cf. e.g. de Bernardo Stempel 1987), apparent cases of older *eN in Irish 
from PIE *N may effortlessly have passed through the PCelt. stage *aN, only to 
have been secondarily raised in the prehistory of Irish. Hence, we may recon- 
struct PCelt. *aN as the regular outcome of PIE syllabic nasals in all instances. 
This is borne out by e.g. Celtib. argato- /arga(n)to-/, Gaul. arganto-, Olr. argat, 
MW aryant, MBret. archant ‘silver’ < PCelt. *arganto- < PIE *h>(a)rg-nt-o- 
and Celtib. tekam-etinas, Gaul. dekam-etos ‘tenth’, Olr. deich ‘ten’ (< *deken) 
< PCelt. *dekam < PIE *dekm(t). 


9.2.5 The Voiced Labiovelar and the Merger of Aspirated and Plain 
Voiced Stops 


Based on MW gieu 'sinews, tendons', OCorn. goiu-en « Brit. plural *gi.ou 
(with a secondary u-stem plural ending *-ou « PCelt. nom.pl. *-oues) « PCelt. 
*e(i)iV- < PIE *g"(i)iah;- (cf. Ved. jvá- ‘tendon, string (esp. of a bow)’, Lith. 
gijà ‘thread’, Gr. Pıög ‘bow’) and MIr. nigid ‘washes’ < PCelt. *nig-i/io- < 
*nig"-ie/o- (Gr. vim) it appears that PIE *g" was delabialized to *g before 
a following *i. For purely structural reasons we would expect PIE *k” and *g"^ 
to be similarly delabialized, but there are no certain instances of this. The 
delabialization must precede the shift of PIE *g” > *b and consequently the 
merger of the PIE voiced and voiced aspirate stops (since *g"^ does not give 
PCelt. *5, but rather PCelt. *g”). Therefore, it can safely be ascribed to the pre- 
Proto-Celtic period, even without any evidence of the sound change from 
Continental Celtic. In all other instances, when PIE *g” was not affected by 
delabialization, it yielded PCelt. *b and as such merged with the outcome of 
PIE *^^ and the much rarer *b. This is demonstrated by e.g. Gaul. -bena, Olt. 
ben, MCorn. ben-en ‘woman’ < PCelt. *bend < PIE *g"en-h; ‘woman’, Olr. 
biur, MW ber ‘spear’ < PCelt. *beru- < *g"eru- and Olr. brao, MW breuan, 
MBret. brou, breau ‘hand-mill, quern’ < PCelt. *brauü, *-on- < PIE *g"rh;-u-on- 
or *g"rah>-u-on-. 

At some point after the development of PIE g" to *5, the PIE voiced 
aspirated stops lost their aspiration and merged with the corresponding 
voiced stops, e.g. PIE *b"ed^(h;)-o- (cf. Lat. fodio, -ere ‘to dig’) > PCelt. 
*bedo- ‘grave’ (Celtib. argato-bezom ‘silver-mine (7), MW bedd, MBret. 
bez ‘grave’), PIE *seg'-etlo- (Gr. £yévAg 'plough-handle") > PCelt. 
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*segetlo- (ModW haeddel, MBret. haezl *plough-handle', MCorn. hethlor 
*ploughman'). 


9.2.6 Changes to the Vowel System 


The long vowel system was restructured, whereby the PIE long vowel phonemes 
*e and *o were eliminated. It is likely that this development had already occurred 
in the pre-Proto-Celtic period: 

PIE *o (including PIE *oH) was eliminated, giving either PCelt. *z (in word- 
final syllables) or PCelt. *a (elsewhere). Accordingly, it merged either with 
the reflexes of PIE *u, *uH or *a, aH: Celtib. n-stem nom.sg. -u, Gaul. -u, 
Olr. aub ‘river’ (€ *abü with u-infection) < PCelt. *-z < PIE *-o, Celtib. 
o-stem Dg. -ui, Lep. -ui, Gaul. -ui < PCelt. *-àj < PIE *-oi, Celtib. o-stem abl. 
sg. -uz < PCelt. *-ud < PIE *-od; as opposed to *4@ < *oH in non-final syllables 
in e.g. Olr. dan, MW dawn ‘gift, endowment’ « PCelt. *danV- « *doh3no- 
(cf. Ved. dana-, Lat. donum). 

PIE *e (including PIE *eh,) was raised to * and merged with the reflex of 
PIE *7 and *;H: Celtib. ti-, Gaul. di-, MW pref. di- < PCelt. *dr < PIE *deh, 
(Lat. de); the Gaul. onomastic element -rix /-rixs/ ‘king’ in e.g. Dumnorix, 
Vercingetorix, Olr. ri, gen.sg. rig, MW rhi « PCelt. *rix-s, gen.sg. *rig-os < 
PIE *(h3)rég-. 

This resulted in a triangular long-vowel system, 7 — à — ii. This system was 
extended with a new e « *ei and, somewhat later, a new ð < *ou during the 
attested history of the Continental Celtic languages. The Insular Celtic 
languages may be derived from a long-vowel system with five vowel qual- 
ities, 7— e (< *ei) - à — 6 (< *ou) — a, matching the five short-vowel qualities, 
i-e-a-o-u. 

Joseph's Law, formulated by Lionel Joseph (1982; cf. Schrijver 1995: 73— 
93), states that a pre-PCelt. sequence *eRa (typically from PIE *eRa) gives 
*aRa. This elegantly explains numerous forms in Goidelic, Brittonic and 
Gaulish, e.g. *taratro- ‘drill’ (Ir. tarathar, MBret. tarazr, Gaul. *taratro- > 
Judeo-Fr. taredre /ta' reóro/, OOcc. taraire) < *teratro- < PIE *terh;-tro- (cf. 
Gr. tépetpov) and *talamüu (Olr. talam ‘the earth, the world") < *telamü < 
*telh;-mo, -mon- (Gr. ceAajov ‘carrying strap’), *earano- ‘crane’ (Gaul. tri- 
garanus ‘having three cranes’, MW garan) < *gerano- < PIE *gerh;no- (Gr. 
yépavoc) which previously had to be reconstructed as *trh»atro-, *tlh»amon- 
and *Srh>ano-. The absence of any traces of Joseph's Law in e.g. fem. a-stems 
and weak d-verbs indicates that it was not triggered by a long *a. It is also likely 
that it did not operate when the *a was word final. 

An expanded version of Joseph's Law has recently been proposed by Eugen 
Hill (2012). According to Hill, sequences of *eLNa were also affected by this 
change. This explains the vocalism of e.g. W sarnu as opposed to Olr. sernaid 
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as deriving from a paradigm PCelt. *sternü, *starnati (vel sim.), from an older 
subjunctive *ster-nh-e/o- (cf. Lat. sterno). 


9.2.7 Cluster Simplification 


It is likely that there was a general loss of stops before -sC- (Stifter 2017: 1191— 
2). This would explain instances such as Olr. tes, MW tes ‘heat’ < PCelt. *testu- 
< PIE (Transponat) *tep-s-tu- and Olr. lesc, MW llesc ‘weak; lazy’ < PCelt. 
*lesko- < PIE (Transponat) */eg’-sko- ‘lying down’. In these cases, the root-final 
stop was not restored, because the association to the underlying root was not 
sufficiently strong. However, when the association with forms with a preserved 
root final consonant was sufficiently strong, the consonant was typically restored. 
The restored stop was subsequently subject to the general neutralization of non- 
dental stops: before a following *t or *s all Proto-Indo-European “non-dental” 
stops (i.e. sk, *g, gh *k, *g, gh jew gw *g™, *p, *b, *b") merge as a velar (or 
uvular) fricative, usually noted *x, as in PIE *h ;oktoH > PCelt. *oxtü (Gaul. 
oxtumetos ‘eighth’, Olr. ocht, MW wyth, MBret. eiz ‘eight’), *septm > 
PCelt. *sextam (Gaul. sextametos ‘seventh’, Olr. secht, MW seith, MBret. 
seiz, PIE (Transponat) *(H)eup-s-elo- (cf. Gr. bynddc) > PCelt. *ouxselo- 
(Gaul. uxello-, Olr. uasal, MW uchel, MBret. uhel ‘high’). The exact 
phonemic status of this *x is not entirely clear; it did not occur in other 
contexts. 

In restored clusters of the structure *xsC, the *s was subsequently 
lost, probably by regular reduction. This paved the way for the r-aorist 
of roots ending in velar stops (as in Olr. pret. -acht ‘drove’, MW aeth, 
MBret. aez ‘went’ < *ax-t- < *ax-s-t from the PCelt. pres. *ag-e/o-), 
ultimately deriving from an old 3sg. s-aorist. A very similar loss, of both 
*s and *x is observable between liquid and stop. This can be observed in 
e.g. Olr. tart ‘thirst’, MW tarth ‘steam, haze’ « PCelt. *tartu- < *tarstu- 
< *trs-tu- and MW arth ‘bear’ < PCelt. *arto- < *arxto- < *hartko-. This 
reduction explains the development of the t-preterite to roots ending in 
liquids, departing from the original 3sg. s-aorist, *b"er-s-t > *bir-t > 
*bir-t- (Olr. birt, -bert, MW kymyrth). 

Inherited *st appears to have been preserved as such in Proto-Celtic, as 
indicated by its survival in e.g. Celtib. boustom ‘cow stable (7) < PCelt. 
*bousto- < PIE *g"ou-sth;-o- (Ved. gosthá-) and occasional survival in 
Brittonic. In Goidelic, on the other hand, *st has given *ss > s in all positions. 
PIE *-Dt-, presumably realized as [tst] and hence indistinguishable from 
*-Dst-, was reduced to PCelt. *-ts-. In Insular Celtic, if not earlier, this was 
further reduced to *ss, as in PIE *uid-to- > *uisso- > Olr. -fess “is known’, 
MW gwys, MBret. gous “was known’. 
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9.2.8 Elimination of PIE *p 


The elimination of the phoneme *p is probably the best-known defining sound 
change in the Celtic languages. However, in many contexts *p leaves a trace by 
merging with other phonemes, e.g. PIE *Vpn > PCelt. *Vun (PIE *suop-no- or 
*sup-no- > PCelt. *souno- > Olr. sian, MW hun ‘sleep’),” *VpL > *VbL (PIE 
*duei-plo- > *dueiblo- > Olr. diabul *twofold"), *ps > *xs (cf. PCelt. *ouxselo- 
above), *pt> *xt (cf. PCelt. *sextam above). It is possible that the phoneme // was 
not completely lost by Proto-Celtic times but that we still had /d/ in the earliest 
attested stage. This is primarily based on Lep. uvamokozis, plausibly analysed as 
a personal name with a first member /udamo-/ from PIE *up-mHo- ‘highest’ (cf. 
Schumacher 2004: 133-4; Eska 2013). Another indication that a reflex of PIE 
*p was still preserved as a distinct phoneme in Proto-Celtic is provided by the 
reflex of PIE *sp. In initial position, this cluster yields Olr. s-, len. f- and Brit. 
*f-, as in Olr. seir ‘heel’, dual di p/h]erith, W ffêr ‘ankle’ < PCelt. *sgeret- < 
PIE *sp^erH- ‘to kick’ (LIV? 585), Olr. selg, MBret. felch ‘the spleen’ < 
PCelt. *sdelga < PIE *spelg"- (cf. Lat. lien, Ved. plihán-, Gr. on/nv). This 
distribution of outcomes does not correspond with that of any other known 
initial cluster. We cannot posit PCelt. *su- as the outcome of PIE *sp- 
because, while PCelt. *su- would account for Old Irish s-, len. f- (as in Olr. 
siur, len. do phethar < PIE *suesor-), it gives *hu- in Brittonic (MW chwaer, 
MBret. hoar ‘sister’) and cannot therefore have merged with the outcome of 
PIE *sp-. Indeed, PIE *sp- appears to be the only regular source of PBrit. */V- 
apart from Latin borrowings with f-. 


9.2.0 | Length Opposition in Consonants 


A length opposition in sonants had already developed in pre-Proto-Celtic. The 
long sonants came about by assimilation, the most common being the assimi- 
lation of *-sR- to *-RR-, as in Celtib. iomui < PIE *iosmoi, Olr. coll, OW coll, 
OBret. coll-guid ‘hazel-tree’ < *kollo- < pre-PCelt. *koslo-. Hence, Proto- 
Celtic acquired an opposition between n and nn, m and mm, l and l, and 
r and rr. PIE postvocalic *-sr- may, however, have yielded *-dr- instead, as 
indicated by Gaul. fidres, Olr. teoir, MW teir fem. ‘three’ < *tidres < PIE 
*tisres (Schrijver 1995: 448—52). However, *rr developed from other sources, 
such as PIE *rs (Olr. carr, MW karr, MBret. carr ‘cart’, Latin carrus from 
Gaulish « PCelt. *karro- « PIE *krso-/krso-) and possibly *rp. 

The phonemic length opposition in sonants is paralleled by a similar oppos- 
ition in stops. Proto-Indo-European did not allow geminate stops, at least not 
outside Lallwórter, but new geminate stops arose at some point in Celtic. This 


? Probably only after rounded vowels, cf. PIE (Transponat) *tep-net- > PCelt. *tenet- > Olt. 
teine ‘fire’). 
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happened either by regressive assimilation. between two stops across 
a morpheme boundary, e.g. PIE (Transponat) *(h2)ad-k(i)iah > “at-ness”, *prox- 
imity’ > *akkià > Olr. aicce ‘proximity; fosterage’, MW ach ‘lineage, ancestry’ 


or through hypocoristic gemination observable particularly in personal names. 


9.2.10  Lenition of Voiced Stops 


A purely phonetic lenition of the short voiced stops after vowels may possibly be 
reconstructed for Proto-Celtic or a Common Celtic period shortly thereafter." 
While this lenition is central to Insular Celtic, operating both word-internally and 
across word boundaries as part of grammatical lenition, there is some evi- 
dence in favour of it going back much further. The use of an apparent sibilant 
symbol for the outcome of mainly postvocalic *d in Celtiberian (Villar 1995) 
may plausibly be interpreted as an indication of phonetic lenition to [ð]. 
Likewise, the occasional loss of intervocalic *g (as in Celtib. tuateres *daugh- 
ters’ < PCelt. *dugateres) may be an indication of intervocalic /g/ being 
realized as [y]. It is likely that this lenition also affected *s (> *h) and 
*m (> *ß), as it did in Insular Celtic. The occasional loss of /s/ in Gaulish 
may support this. 


9.2.11 Morphological Innovations 


As noted above, in many instances where Brittonic and Goidelic share mor- 
phological innovations, it is difficult to tell whether or not these innovations 
date back to Proto-Celtic, Common Celtic or a later stage common to Goidelic 
and Brittonic. The following non-trivial innovations are likely to date back to 
Proto-Celtic or at least an early Common Celtic stage: 

* Levelling of the pronominal *so-/to- paradigm in favour of the allomorph 
with *s-, as evident from e.g. Celtib. dat.sg. somui < PCelt. *sommüi <- PIE 
*tosmoi, the Olr. 3pl. prepositional dative ending -ib and the PBrit. 3pl. 
prepositional ending *-uß (MW -udd, MBret. -e, -o, -eu) < PCelt. dat.pl. 
*soibis — PIE *toib"i(s). This innovation is possibly shared with Italic. 
Loss of the agent noun suffix *-ter-/-tel-: while there are many instances of 
the instrument-noun suffix *-tro-/-tlo- in Celtic, the agent-noun suffix *-ter-/ 
-tel-, from which *-tro-/*-tlo- is plausibly derived, is completely absent from 
Celtic. Instead, we find alternative productive suffixes with this function, 
such as *-ijo- and *-iiati- (abstracted from *CeCH-ti- > *CeC-ati- and 
suffixed to *-iio-stems). 


w 


The Common Celtic period refers to the period after the split-up of Proto-Celtic in which 
innovations could still affect all Celtic varieties. 

Cf. the treatment of final unstressed *-/uß ‘herb’ in *tud-luf ‘navelwort’ > ModBret. tule, tulo, 
tulev. 


ES 
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* Elimination of the present and past active participle as part of the verbal 
paradigm. The former survives in numerous fossilized nominal formations, 
e.g. PCelt. *karant- ‘friend’ (Olr. carae, W car). The past passive participle 
is preserved in this function, though typically in the form *-tio- for expected 
*-to- in Insular Celtic. 

* Merger of the aorist and the perfect into a preterite (cf. the parallel innov- 
ations in the prehistory of Italic and Germanic). 

* Loss of the inherited categories of subjunctive and optative (although the 
Celtic s- and a-subjunctives and futures may continue PIE s-aorist 
subjunctives). 

e s-aor.3sg. *-s-t reanalyzed as *-st- and used as a marker for the past tense. 

* Thematization of *es- ‘is’: there is evidence from both Old Irish and Brittonic 
that at least some of the present-tense forms of *es- (PIE */j;es-) were 
thematized to *es-e/o-, as described by Schrijver 2020. 


9.3 The Internal Structure of Celtic 


The precise internal subgrouping of Celtic is still not entirely settled (cf. 
the tentative tree in Figure 9.1). However, it seems fairly clear that 
Celtiberian should be contrasted with the more northern varieties (cf. 
Schrijver 2015; Eska 2017). This may be demonstrated for instance by 
the development of a clitic relative particle *io in Gaulish, Goidelic and 
Brittonic as opposed to the fully inflected relative pronoun *io- in 
Celtiberian (e.g. dat.sg. iomui) and the transfer of the feminine gen.sg. 
ending *-ias from the i-stems to the d-stems (Gaul. -ias, Olr. -e as 
opposed to Celtib. -as). Celtiberian, conversely, has innovated e.g. by 
creating a new o-stem gen.sg. in -o of unclear origin. 


Old Irish 


Welsh 
Cornish 
Breton 


Insular Celtic(?) Brittonic P 


Gallo-Brittonic(?) Gaulish 


doli Celtic: Lepontic 
Celtic Celtiberian 


Figure 9.1 The Celtic languages 
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9.3.1 Goidelic 


Establishing Goidelic as a branch separate from the other Celtic branches is 
unproblematic, Goidelic being easily defined by a long series of sound changes 
differentiating it from Proto-Celtic and resulting in the remarkably uniform Old 
Irish language. Recent chronological overviews of these developments are 
given in Sims-Williams (2003: 296-301) and Stifter (2017: 1198-200). The 
subsequent Goidelic dialects derive more or less directly from Old Irish. 
Among the defining features of Old Irish, we can include the following, in 
rough chronological order (largely following McCone 1996: 105-25; but cf. 
e.g. Kortlandt 1997 and Isaac 2007: 97-113): 

Fronting and raising of PCelt. *a to *@ before tautosyllabic nasals. This *ce 
may subsequently be further raised to e/i or é (when lengthened) or revert 
back to a, the conditions for this being debated (cf. McCone 1992; Schrijver 
1993). Word final *-an (as in the nom.-acc.sg. of neuter n-stems and the acc. 
sg. of á-stems and consonant stems) is also affected by this, giving *-en 
which usually causes palatalization when lost by apocope. 

*o > *a in final syllables. 

VNT> V()D, i.e. loss of nasals before voiceless obstruents (PCelt. *k, *k", *t, 
*s, *x), with voicing of a following stop and frequently with compensatory 
lengthening of a preceding front vowel, e.g. PCelt. *kanto- ‘100’ > *keento- > 
Olr. cét /k'e:d/, PCelt. *krenxtV- > Olr. crécht ‘wound, scar’ (MW creith, 
MBret. creizenn). 

Postvocalic lenition of voiceless stops to the corresponding voiceless 
fricatives. 

Raising and lowering of short vowels caused by the height of the vowel in the 
following syllable. 

Several rounds of palatalization, whereby consonants are palatalized by front 
vowels under different circumstances. The front vowels may subsequently be 
lost (by syncope or apocope) or reduced to schwa, causing the palatalization 
to become phonemic. 

Loss of the rounding of the reflexes of PCelt. *k", *g” and merger with the 
plain velars. In some cases, the rounding may be transferred to 
a following vowel, as in PCelt. *k"rimi- (MW pryf) > Olr. cruim /krup’/ 
‘worm’. 

Apocope of vowels in absolute auslaut. Long vowels followed by 
a consonant are shortened but preserved. 

Loss of fricatives before sonants with compensatory lengthening or 
u-dipthongization of the preceding vowel, e.g. *(d)etnos > *eOnah > Olr. 
én ‘bird’ (MW edyn, MBret. ezn), gen.sg. *($)etnī> *eOni > euin. PCelt. *tr > 
Or is not affected by this change, cf. PCelt. *aratrom (MW aradyr, MBret. 
arazr) > *ara@ran > Olr. arathar ‘plough’. 
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* Syncope of vowels in even-numbered, medial syllables, after the operation of 
apocope. 
e Initial, unlenited *u- > Olr. f- (presumably /$-/), PCelt. *uiro- > Olt. fer. 


9.3.2 . Brittonic 


The existence of a Brittonic subgroup distinct from Goidelic is uncontroversial, 

even if the position of Brittonic is itself contested. Much like Goidelic, 

Brittonic underwent a fundamental transformation in the early Medieval 

period. Unlike Old Irish, however, where the inherited nominal and verbal 

morphology remained largely intact, albeit in a much-altered guise, the sound 
changes in Brittonic resulted in massive restructurings in inflectional morph- 
ology, leading to a complete loss of the nominal case system. The singular/ 
plural opposition has also been partially restructured: in some nouns, typically 
ones that would often occur in the plural, the underived form has plural (or 

“collective”) meaning, with the singular (or *singulative") being formed by 

a Proto-Brittonic suffix *-ınn (masc.), *-enn (fem.), as in e.g. coll. *guró ‘trees’, 

sglt. *gurö-enn ‘a tree’ (W gwydd, gwydden, MBret. guez, guezenn) < PCelt. 

*uidu- (Olr. fid). 

These defining sound changes took place after the introduction of the main 
body of Latin loanwords, since these loanwords are generally affected by the 
same changes as inherited vocabulary. Many of the changes may be due to 
contact with early Gallo-Romance, such as voicing of postvocalic voiceless 
stops, penultimate stress, loss of phonemic vowel length and the loss of the 
neuter gender, cf. Schrijver 2002. The phonological changes leading from 
Proto-Celtic to Old Welsh, Old Cornish and Old Breton have been treated in 
great detail by Jackson 1953: 265-699, Schrijver 1995 and Sims-Williams 
2003. Chronological overviews are given in Jackson 1953: 694—99 and 
Stifter 2017: 1200-1. 

* The Proto-Celtic voiced geminate stops appear to have been devoiced in 
Brittonic (cf. Pedersen 1909: 159—61) and subsequently fricativized regu- 
larly, just like the Proto-Celtic voiceless geminates (cf. spirantization below). 
This is borne out by e.g. PCelt. *biggo- (Olr. bec /b'eg/) > *bikko- > PBrit. 
*bix-an (dimin. suff. *-an; W bychan, Bret. bihan ‘little’), PCelt. *klogga 
(Olr. cloc, clog /klog/) > *klokka > PBrit. *klox (W cloch, Bret. kloc’h ‘bell’), 
PCelt. *uragga (Ir. frac, frag /frag/?) > *urakka > PBrit. *urax (W gwrach, 
Bret. gwrac 'h ‘hag’), PCelt. *buggo- (Olr. bog ‘gentle, tender’) > *bukko- > 
Brit. *bux (ModBret. bouc’h ‘blunt’; Bret. bouk ‘soft’? must instead be 
a borrowing from Irish). This change also accounts for the development of 
PCelt. *zd (*öd?) in Brittonic, which appears to have gone through *dd (Olr. 
/d/) to *tt and ultimately to PBrit. *0. This may be exemplified by e.g. PCelt. 
*nizdo- > *niddo- (OMlr. net, ned /N'ed/) > *nitto- > PBrit. *ni0 (W nyth, 
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Bret. neizh ‘nest’). The change must predate the creation of geminate stops 
by assimilation of preverbs and verbs and the univerbation of verbal com- 
pounds such as PCelt. *kred-di- ‘believes’ (MW credu, MBret. cridiff, Olr. 
creitem ‘to believe’). 

e Restructuring of the vowel system: fronting of PCelt. *z > *r (causing 
a merger with PCelt. *7 ) and monophthongization of the remaining diph- 
thongs: PCelt. *oi > *o (merging with the reflex of PCelt. *ou)  *ii, PCelt. 
*ai > *? and PCelt. *au > *5 (possibly via *a). 

e Final a-affection: a Proto-Celtic long *a@ in the final syllable lowers 
a preceding PCelt. *;, *u to *e, *o. This may be observed in feminine d- 
stems, especially in adjectives in Middle Welsh, where the lowering has 
become a mark of the feminine, e.g. nom.sg.masc. *uind-os, nom.sg.fem. 
*uind-à > MW masc. gwynn, fem. gwenn ‘white’. 

* Lenition of postvocalic voiceless stops to the corresponding voiced ones, e.g. 
PCelt. *bratir > PBrit. *brodr ‘brother’ (MW brawd, MBret. breuzr), PCelt. 
*dekam > PBrit. *deg ‘ten’ (MW dec, MBret. dec /deg/). 

* Nasalization of voiced stops, ND > NN, as in PCelt. */anda (Mlr. land, Gaul. 
*landa = Fr. lande ‘heath’) > MW Ilan ‘enclosure; church’, MBret. lann. 
This also operates in syntactically close external sandhi and gives rise to the 
limited Brittonic nasal mutation. 

* Fixed stress on the penultimate syllable. With apocope (see below), the stress 
came to fall on the final syllable. 

e Final i-affection, whereby a short vowel is raised and/or fronted by a final *-7 
and *-io-. After apocope, a new round of i-affection takes places, this time 
caused by high front vowels still remaining after apocope. 

* Apocope of all final syllables. In contrast to Goidelic, even long vowels 
followed by consonants are lost. 

* Syncope of immediately pretonic vowels in open syllables. 

* Spirantization or *second lenition", whereby previously unlenited voiceless 
stops become voiceless fricatives after vowels and non-homorganic frica- 
tives (thus Schrijver 1999; for a different chronology, see Sims-Williams 
2007: 43—58). This includes former geminate stops and stops protected from 
the first lenition in external sandhi. It is possible that this development was 
sufficiently late to have developed differently in Welsh and South-West 
Brittonic: in external sandhi, spirantization only seems to occur after vowels 
in Welsh, while in South-West Brittonic it also appears to take place after 
non-homorganic sonants. 

* The new quantity system, whereby the inherited phonemic vowel length was 
lost. However, this did not entail any large-scale merger, since older phon- 
emic contrasts in length were shifted to quality (e.g. PCelt. *7> *; as opposed 
to PCelt. *i > *7, PCelt. *i > *# as opposed to PCelt. *u > *u and PCelt. *a/au 
> *5 > *5 as opposed to PCelt. *o > *o) or preserved by diphthongization of 
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the old long vowels (PCelt. *ei > *e > PBrit. *ur and PCelt. *ai > *ē > *or). 
The new quantity system only has allophonic vowel length, with stressed 
vowels being long before single consonants and in word-final position and 
short elsewhere. 

A number of changes take place in the course of the Old British transmission 

but are nevertheless shared by all Brittonic branches, e.g.: 

e Initial, non-lenited *u- > *gu-, as in PCelt. *uiro- > PBrit. *uur > MW 
gwr ‘man’. 

* Accent retraction from the final to the penultimate syllable. It is unclear 
whether the final stress of Vannetais Breton is the result of a later forwards 
shift due to French influence or if it represents an archaism, with the Proto- 
Brittonic final stress being preserved due to a higher proportion of French 
speakers in this region. 

Though it may seem surprising at first glance, given the geographical proximity 

of Cornwall to Wales and its relatively long distance from Brittany, there is 

a fair amount of evidence in favour of a distinct South-West Brittonic branch 

consisting of Cornish and Breton to the exclusion of Welsh (cf. Hamp 1953, 

Jackson 1953: 19-25 and passim, Schrijver 2011: 15-33). 


9.3.3 The Position of Brittonic: Gallo-Brittonic or Insular Celtic? 


The position of Brittonic in the Celtic family tree remains an unsolved question, 

specifically whether we should posit an Insular Celtic node consisting of 

Brittonic and Goidelic to the exclusion of Gaulish (as e.g. McCone 1992; 

Schrijver 1995: 463-5; Eska 2017), a Gallo-Brittonic node excluding 

Goidelic (as Koch 1992) or a dialect continuum with a fundamental three- 

way split, allowing Brittonic to share innovations with both Gaulish and 

Goidelic (thus e.g. Sims-Williams 2007: 34). 

Among the potential Gallo-Brittonic innovations are the following: 

e *kv > *p in Gaulish, Leponic and Brittonic, as in PCelt. *ek"vo- ‘horse’ > 
Gaul. Epona ‘name of a goddess’, MW ebawl ‘foal’ (cf. Olr. ech 
‘horse’). 

* A change of *oRa to *aRa, i.e. an expanded Joseph's Law, seems to occur in 
Brittonic and Gaulish, as shown by MW taran ‘thunder’, Gaul. Taranis as 
opposed to Olr. torann ‘thunder’. 

* PCelt. *sr- > *fr- seen in e.g. PCelt. *sroKna (Olr. srón ‘nostril’) > Brit. 
*from (W ffroen, MBret. froan ‘nostril’), Gaul. *frogná (whence OFr. 
frongne ‘scowl, frown’). 

* Devoicing ofthe voiced geminate stops: we may assume that this change also 
took place in Gaulish, thus providing us with a potential Gallo-Brittonic 
isogloss. This is based on the evidence of PCelt. *k"ezdi- ‘bit, piece" > 
* kweddi- (Olr. cuit /kud’/) > Gallo-Brit. *pettia (Gaulish > LLat. *pettia > 
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Fr. piece, etc.), Brit. *pe0 (W peth, Bret. pezh) and PCelt. *bozdo- ‘knob’ > 
*boddo- (Mlr. bot /bod/ ‘tail; membrum virile") > Gallo-Brit. *botto- 
(Gaulish > LLat. *bottu- > Fr. dial. bo, bout *hub of a wheel’, bouton 
“button’), Brit. *bo@ (W both ‘hub of a wheel’), cf. Delamarre 2003: 93, 
249—50. Even if one does not accept a general devoicing of voiced geminates 
in Gallo-Brittonic, the specific development of PCelt. *zd to *tt still consti- 
tutes an isogloss. 

* Thematization of feminine consonant stems: a few consonant stems, pre- 
served as such in Goidelic, seem to have been transferred to the feminine a- 
stems in Brittonic and Continental Celtic. The trigger for the transfer was 
probably the Proto-Celtic acc.sg. *-am (< PIE *-m) and acc.pl. *-ds (< PIE 
*-ms), which had become identical to the feminine @-stems. Examples 
include PCelt. *abü, *abon-am (Olr. aub, abainn) — *abon-à (MW afon, 
MBret. auon ‘river’, Gaul. *abona => Fr. Avosnes, name of a village), PCelt. 
*brix-s, *brig-am (Olr. bri *hilU) — *brig-à (MW bre, MBret. bre ‘hill’, 
Gaul. -briga in numerous place names) and PCelt. *brus-ü, *brunn-am (Olr. 
brú, broinn? ‘belly; womb’) — *brunnd (MW bron, MBret. bronn ‘breast’, 
Gaul. *brunnä, possibly reflected in Modern Gallo-Romance, cf. von 
Wartburg 1928: 566). 

The list of potential shared innovations between Gaulish and Brittonic may 

not be particularly impressive, yet it should be noted that it is difficult to point 

to any significant Gaulish innovations not shared with Brittonic. One might 
even pose the question as to whether Brittonic could simply be seen as 
continuing a dialect of Gaulish. Such a scenario would require the following 

Insular Celtic innovations to have resulted from a later Sprachbund-type 

situation: 

* The absolute/conjunct opposition, whereby many finite verbal forms have 

longer endings when in initial position of the verbal syntagm. This is in all 

likelihood an innovation of “Insular Celtic”, brought about by the general- 
ization of a main clause verbal particle *er(i), which occupied the second 
position of the clause, protecting the verbal endings from reduction when the 

verb is clause initial (Schrijver 1994; 1997: 147—58; Schumacher 1999; 

2004: 90—114). 

Striking similarities in the system of verbal morphology, particularly 

with regard to the formation of compound verbs, perfective particles and 

infixed pronouns. There is little to no evidence for this from Continental 

Celtic. 


It is conceivable that the Old Irish paradigm Nsg. bru, Gsg. bronn reflects a remodelled 
PCelt. *brunn-s, *brunn-os with the Nsg. levelled after the oblique cases, rather than the 
spectacularly archaic and irregular PCelt. *brus-ü, *brun-n-os. lrrespective of this, the 
ultimate origin of either Proto-Celtic paradigm must be a PIE (Transponat) *P'^rus-o, 
*b'rus-n-os. 
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* Analogical replacement of *uer ‘over’ (PIE *uper) with *uor (Olr. for, MW 
gor-, gwar- as opposed to Gaul. uer-), probably under the influence of the 
antonym *uo ‘under’ (PIE *upo). 


9.4 The Relationship of Celtic to the Other Branches 


With the exception of Italic (cf. Chapter 7), no branch of Indo-European appears 
to share a significant number of isoglosses with Celtic, at least not to the extent 
that a plausible case can be made for a shared post-Proto-Indo-European stage. 

Admittedly, there does appear to be a special connection between Celtic and 

Germanic to the exclusion of other branches. However, the shared features are 

almost exclusively lexical in nature (for Dybo's Shortening see Section 9.2.3), 

either the existence of a root, e.g. *tegu- ‘fat, thick’ (Olr. tiug, W tew; OE picce, 

OHG dicki), *magu- ‘boy, servant’ (Olr. mug, MCorn. maw; Goth. magus, OE 

magu), or a specific semantic development encountered only in these two 

branches, such as *priH-o- ‘beloved’ (Ved. priyd-) > ‘free’ (W rhydd, Goth. 
freis, OHG fri). The absence of any securely identified innovations in the realm 
of inflectional morphology between Celtic and Germanic makes it very likely 
that this relatively impressive collection of lexical isoglosses is due to 
borrowing. 

Apart from lexical isoglosses, there are a few apparent shared innovations 
which are worth mentioning: 

* A notable syntactic correspondence between Hittite and Old Irish is the use of PIE 
*nu (Hitt. nu, Olr. no) as a sentence initial particle. In Old Irish, this is done in 
order to provide a preverb to which a clitic pronoun can be attached, while in 
Hittite it functions as a sentence connecting particle to which clitics may be 
suffixed. 

* Another notable syntactic correspondence is between Celtic and Tocharian. 
This is the development of the PIE adverb *(h ;)eti (Lat. et, Goth. ib, Ved. ati) 
to a clitic obeying Wackernagel's Law. In Insular Celtic this yields the main 
clause particle *er(i), blocking lenition of the following element, responsible 
for the emergence of the absolute/conjunct allomorphy in the finite verb in 
Insular Celtic, in Tocharian B it produces -s ‘and’, a clitic connector follow- 
ing the first word of the clause (Hackstein 2005: 176). 


9.5 The Position of Celtic 
See Chapter 7. 


© One could consider *magu- to be a late borrowing from Germanic to Common Celtic. This would 
allow us to reconstruct pre-Germ. *mhzk-u- ‘a reared one’ > PGerm. *magu-, from the PIE root 
*mah k- ‘to rear, to nourish’. The inherited Celtic cognate would be PCelt. *mak"o- ‘boy, son’, 
continuing a thematized *mh2k-uo-. 
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10 Germanic 


Bjarne Simmelkjer Sandgaard Hansen & 
Guus Jan Kroonen 


10.1 Introduction 


Germanic languages are spoken by about 500 million native speakers. They 
constitute a medium-large subgroup of the Indo-European language family and 
were originally located in Northern Europe, owing much of their current distribu- 
tion to the recent expansion of English. From a historical perspective, notable old 
Germanic languages were Gothic, Old Norse, Old English, Old Frisian, Old 
Saxon, Old Franconian (poorly attested) and Old High German (Bousquette & 
Salmons 2017: 387-8). Gothic, mainly known from a fourth-century Bible trans- 
lation, continued to be spoken in a local variant in Crimea until the late eighteenth 
century but subsequently went extinct (Nielsen 1981: 283-8). The remaining old 
Germanic languages developed into modern varieties such as English, Frisian, 
Dutch, German and the Nordic languages (Henriksen & van der Auwera 1994). 

However, Northern Europe must have witnessed speakers of some form of 
Germanic even prior to the attestation of these old Germanic languages. Runic 
inscriptions in a language that we may label Early Runic appear from 
the second century onwards, and one inscription on a fourth-century-BCE 
bronze helmet, the Negau B helmet, has been unearthed in Slovenia. This 
inscription, which is in a northern Etruscan alphabet and reads 
hariyastiteiwa, constitutes our earliest evidence of Germanic, at least 1f we 
follow Markey (2001) in interpreting it as ‘Harigast the priest’.' It thus consti- 
tutes a terminus ante quem for some of the linguistic features that define 
Germanic (Section 10.2). 


10.2 Evidence for the Germanic Branch 


In this section, we shall list some of the most important diagnostic features of 
Germanic within the realms of phonology and morphosyntax, which constitute 
the most reliable means for establishing linguistic clades (see Section 2.3). 


! With harigast as the Germanic words for ‘army’ and ‘guest’ and teiwa as a linguistic precursor of 
the Nordic theonym Týr. Alternatively, Must (1957: 55-7) sees a Rhaetic name consisting of 
Venetic and Etruscan elements in this inscription. 


152 
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10.2.1 Phonology 


As indicated by Ringe (2017: 113-27, 147-50 and Section 4.3), all Germanic 
languages display reflexes of the outputs of the following phonological 
innovations.” Three of these, no. 1, 4 and 5, are frequently said to constitute 
the most striking hallmark of the Germanic languages, i.e., ^what to a large 

extent defines Germanic" (Kroonen 2013: xxvii). 

1. Rask/Grimm's Law I: PIE *p t k/k k» > fricatives *f p h hw unless an 
obstruent immediately preceded, e.g. Goth. fadar ‘father’ ~ Ved. pita, Gr. 
matnp; Goth. preis ‘three’ ~ Ved. trayah, Gr. tpeic. 

2. Verner’s Law: *fh sh hw>*f ð zy yw if not word-initial, if not adjacent to 
a voiceless sound, and if the last preceding syllable nucleus was not 
accented; e.g. Goth. fadar ‘father’ (PGmc. *fader- < PIE *ph>tér-) ~ 
Goth. bropar ‘brother’ (< PGmc. *broper- < PIE *b'réhzter-). 

3. Kluge's Law: PIE *-Pn- -Tn- -Kn- > *-bb- -dd- -gg- (Kluge 1884; Lühr 
1988; Kroonen 2011), e.g. OE liccian, OS likkon, OHG leckon ‘to lick’ < 
PGmc. *likkön- < PIE *lig'-neh>-. 

4. Rask/Grimm's Law II: PIE *b d g/g g" > *p tk kw (including *bb dd gg > 
*pp tt kk) (succeeding no. 3), e.g. Goth. twai ‘two’ ~ Ved. dvau, Gr. dba; 
Goth. aukan ‘increase’ ~ Lat. augeö ‘increase’, Lith. augti ‘grow’. 

5. Rask/Grimm's Law III: PIE *b" d^ g"/g4 g"^ > fricatives *6 ð y yw, e.g. OS 
nebal ‘fog’ ~ Ved. nábhas-, Gr. végog; Goth. daur ‘door’ ~ Gr. Ovpa, Lat. 
fores. 

6. *B ð y yw > *b d g gw after homorganic nasals, and *f 0 > *b d word- 
initially. 

7. Shift of stress to the first syllable of the word. 

8. Simplification of geminates after heavy syllables, e.g. PGmc. *wisa- ‘wise’ 
(OHG wis) < *wissa- < PIE *ueid-to-; *deupa- ‘deep’ (Goth. diups) < 
*deuppa- < PIE *d"eub'-no-. 

As Ringe (Section 4.3) also mentions, the collocation of these innovations 

reduces the likelihood of them having taken place individually in each 

language — and thus the likelihood of these languages not emanating from 

a common predecessor — to practically zero. However, the list does not 

confine itself to these eight innovations. We may add at least a handful of 

further innovations, most of which concern the development of the inventory 
of stressed vowels. Examples include 


N 


Innovations no. 3 and 8 are not mentioned by Ringe (2017). However, we have included them 
here to demonstrate the full range of the interdependency of these phonological innovations. The 
sequence of innovations no. 1-5 is disputed. Some adherents of the glottalic theory (e.g. 
Kortlandt 1991: 3) have Verner's Law (no. 2) predate both Kluge's Law (no. 3) and Rask/ 
Grimm's Law (no. 1, 4 and 5). 

In view of PGmc. *seuni- 'sight, vision’ (Goth. siuns, ON sjón, etc.) < *sey"ni- < PIE *sek"-ni-, 
the occurrence of Kluge's Law must postdate innovation no. 2. 


w 
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1. Merger of post-laryngeal-colouring PIE *a o 2 > a, e.g. OHG hasö ‘hare’ ~ 
Ved. śáśa- (< *$ása-), OPru. sasins (< PIE *kas-); Goth. gasts ‘guest’ ~ Lat. 
hostis ‘enemy’ (< PIE *g’ostis); Goth. fadar ‘father’ ~ Ved. pita, Gr. naınp 
(< PIE *photer-). 

2. Merger of post-laryngeal-colouring PIE *a o > *a. 

3. *à > *0,* e.g. Goth. sokjan /sökjan/ ‘seek’ ~ Lat. sdgire; Goth. bloma 
/bloma/ ‘flower’ ~ Lat. flos. 

4. PIE *r /m n 7 *ur ul um un, e.g. Goth. baurgs /burgs/ ‘city’ ~ Av. baraz- 
‘high, hill, mountain’, Olr. bri (brig-) ‘hill’ (< PIE *b'rg'-); Goth. fulls ‘full’ 
~ Ved. pürnah, Lith. pilnas (< PIE *plh nos). 

5. Holtzmann's Law: PIE *-i- -u- > PGmc. *-jj- -ww- under some conditions.” 


10.2.2 Morphosyntax 


Morphosyntax, too, provides a range of compelling evidence that classifies the 

Germanic languages as belonging to a separate branch. A morphological 

innovation that may count as one of the defining hallmarks of Germanic is 

the rise of its verbal system. All old Germanic languages share a verbal system 
consisting of three subsystems:° 

* ablauting "strong verbs" with a present stem predominantly continuing the 
Proto-Indo-European thematic presents and a preterite stem continuing the 
Proto-Indo-European perfect, e.g. Goth. bind-an ‘bind’, band ‘I/he bound’, 
bund-um ‘we bound’ 

* non-ablauting “weak verbs” with present stems of varying sources and 
preterite stems formed with a suffix containing a dental consonant, mostly 
in the form of reflexes of PGmc. *0, e.g. Goth. haus-j-an ‘hear’, haus-i-da *Y/ 
he heard’ 

* "preterite-present verbs" where the present is formed as the strong-verb 
preterites and the preterite as the weak-verb preterites, e.g. Goth. kann ‘I/ 
he can’, kunn-um ‘we can’, kun-pa ‘Ihe could’ (see also Section 10.5.2) 
Although most ofthe building blocks of this verbal system are reflected in other 

Indo-European languages and thus continue Proto-Indo-European elements and 

processes, their regrammation and reparadigmatisation into a coherent system is 


4 


Together with a few other loanwords, Gothic rumoneis ‘Romans’ witnesses that innovation no. 2 
is a necessary intermediary step and no. 3 must have happened after the acquaintance of the 
Germanic-speaking peoples with Latin. The source word, Lat. romani, has had its 6 rendered as à 
(probably because innovation no. 2 caused absence of ö in the Germanic/pre-Gothic vowel 
system) and its à rendered as ö (probably because the word was borrowed prior to innovation 
no. 3) (Noreen 1894: 11-12; Ringe 2017: 171; contested by Stifter 2009: 270-3). 

The exact conditioning remains debated, most likely involving either adjacency to laryngeals 
(Hoffmann 1976: 651; Jasanoff 1978; Rasmussen 1990, 1999) or pretonic position (Kluge 1879: 
128; Kroonen 2013: xxxviii-xl); see also Section 10.3.4. 

In addition to these three subsystems, we find some mixed verbs and a handful of irregular verbs. 


u 


a 
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Table 10.1 Adjectival definiteness 


Content modifier of noun phrases (adjective) > individualising or characterising noun 
modifier of non-definite noun phrases 7 modifier of definite noun phrases 

Expression reflexes of mainly PIE a-/ö- adjectival reflexes of the PIE n-stem type (= 
stems (- strong adj.) weak adj.) 


a purely Germanic innovation." So is one of the building blocks: the dental suffix 
found in the preterite stem of the weak verbs (Meid 1971: 107-11; Rasmussen 
1996/1999; Ringe 2017: 191—4; differently Lühr 1984; Kortlandt 1989). 

The system of strong and weak adjectives (Ringe 2017: 313-15) constitutes 
another regrammation of inherited building blocks that is highly characteristic for 
Germanic. Continuing mainly PIE a-/o- and n-stems, respectively, they are not 
innovations formally speaking. However, the regrammation and reparadigmatisa- 
tion of the function of these nominal stems (see Table 10.1) is truly innovative, as 
is the intrusion of pronominal endings in the strong-adjective paradigm. 

Finally, degrammation and, in particular, deflection are phenomena often 
associated with the Germanic branch. Many ofthe Proto-Indo-European inflec- 
tional categories have been lost in Germanic, e.g. the aspectual system and the 
subjunctive mood (Ringe 2017: 177, 182—6). Others are on the verge of being 
lost, e.g. the mediopassive, the dual, and the vocative, ablative, locative and 
instrumental cases. Having arisen independently in the Germanic languages, 
however, these latter deflectional processes do not characterise Germanic as 
such. For instance, the vocative is still attested residually in Gothic, likewise 
the instrumental in Old High German, Old Saxon and Old English, and Early 
Runic may display one instance ofa noun in the ablative with ablatival function 
(Hansen 2016: 10—16). Thus, while the linguistic structures that would trigger 
this deflection may very well have been present in Proto-Germanic, the pro- 
cesses themselves occurred individually. 


10.3 The Internal Structure of Germanic 


It is traditionally assumed that the Germanic languages split into three sub- 

branches (Schleicher 1860; Streitberg 1896; Hirt 1931; etc.): 

* East Germanic: the long-extinct Gothic language, Crimean Gothic and sev- 
eral other languages, likewise long-extinct, of which we have no or only little 
proof apart from toponyms and ethnonyms, e.g. Vandalic and Burgundian 


7 For the application of the terminology of grammation, regrammation and degrammation and 
the connections between grammaticalisation and paradigmatisation, see Andersen 2006; 
Norgard-Sorensen & Heltoft 2015. 
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* North Germanic: the modern-day Nordic languages Icelandic, Faroese, 
Norwegian, Elfdalian, Swedish and Danish and their immediate predecessors 
as well as the now-extinct language varieties of Norn 

* West Germanic: English, Frisian (West, East and North), Dutch, Low German, 
High German and their various dialects, derivations and predecessors." 


10.3.1 East Germanic 


Linguistic traits and developments specific for East Germanic include, within 
the realm of phonology, the raising of PGmc. *@ to e (Goth. mena /mena/ 
‘moon’ ~ ON mani, OHG mano), the devoicing of word-final PGmc. *-z > -s 
(Goth. fisks ‘fish’ ~ ON fiskr, OHG fisc), the development of word-final PGmc. 
*_6 > -a (neuter a-stem nom./acc.pl. Goth. -a ~ ER -u, ON -Q"? OHG -u/-2) 
and the absence of a-, i- and u-mutation (Goth. wulfs ‘wolf? ~ OHG wolf; Goth. 
gasts ‘guest’ ~ ON gestr, OE giest). 

Within the realm of morphology, innovations include paradigmatic level- 
lings of the results of Verner's Law (Section 10.2.1) in favour of the unvoiced 
variant (Goth. wairban—warp—waurpum—waurpans ‘become’ ~ OE weorpan- 
wearb-wurdon-worden) and the creation of a deictic demonstrative pronoun 
Goth. sah ‘this’ (with -A < PIE *-k"e ‘and’). We also see several instances of 
retention, e.g. of the reduplication in the preterite of reduplicated strong verbs 
(Goth. haitan-hathait-hathaitum-haitans ‘call’), of four classes of weak verbs 
and partially of the grammatical categories of dual and mediopassive in the 
inflection of verbs. 


10.3.2 North Germanic 


If we turn to North Germanic (Nielsen 2000: 255—65), some of the most salient of 
the many phonological innovations include loss of word-initial PGmc. *j- (ON 
ungr ‘young’ ~ Goth. juggs /jungs/, OHG jung) and of word-initial *w- before 
rounded vowels (ON ulfr ‘wolf? ~ Goth. wulfs, OHG wolf), assimilation of PGmc. 
*-ht- > -tt- (ON nótt, nátt ‘night’? ~ Goth. nahts, OHG naht), loss of word-final 
nasals (ON bera ‘carry’ ~ Goth. bairan), rise of i- and u-mutation with subsequent 
syncope or shortening of the mutation-causing unstressed vowel (PGme. *gastiz 
‘guest’ > ON gestr ~ Goth. gasts)'° and breaking of stressed PGmc. *e > ja and jo 
when the following syllable contained a and u prior to the aforementioned 


* Scholars such as Robinson (1992: 11—12), Nielsen (2000) and Bousquette & Salmons (2017: 
389) express minor reservations concerning the unity of the West Germanic branch. 
The superscript u signifies u-mutation on the vowels of the preceding syllable(s). 

10 Similar, though not entirely identical, processes have taken place in the West Germanic 
languages (Section 10.3.3). 
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syncope, respectively (ON jafn ‘even, equal’ ~ Goth. ibns ‘even, level, flat’, OHG 
eban ‘even, equal’; ON joró ‘earth, soil’ - Goth. airba /irpa/, OHG erda). 

On the morphological level, most of the traits that characterise North 
Germanic consider loss of some of the grammatical categories that were 
partially preserved elsewhere, e.g., the instrumental case or the dual and the 
mediopassive in the inflection of verbs. Others are true innovations such as the 
creation of a new personal pronoun in the third person (ON hann ‘he’, hon 
‘she’), the replacement of the pres.3sg. ending -p with pres.2.sg. -r > -r and the 
grammaticalisation of verbs plus reflexive pronouns into a new passive voice. 


10.3.3 West Germanic 


The traits and developments that justify the assumption of a West Germanic unity 
(Nielsen 2000: 241—7) include several innovations shared with North Germanic 
(Section 10.3.4). Others are not shared with North Germanic, e.g. phonological 
processes such as the gemination of all consonants except r in front of *j (PGmc. 
*hafja- ‘hold up, bear up, lift? > OS hebbian ~ Goth. hafjan, ON hefja) (Krahe 
1966: 95—6),!! the gemination of obstruents in front of prevocalic *r and 
*/ (PGmc. *bitra- ‘sharp, bitter" > OS bittar; PGmc. *apla- ‘apple’ > OS 
appul), the rise of i-mutation with subsequent partial syncope or shortening of 
the mutation-causing unstressed vowel (PGmc. *gastiz ‘guest’ > OE giest ~ Goth. 
gasts)'” and the loss of word-final PGmc. *-z in unstressed syllables prior to its 
merger with regular r (PGmc. *fiskaz ‘fish’ > OHG fisc ~ Goth. fisks, ON fiskr). 
The replacement of the original strong-verb pret.2sg. ending (formed by 
adding - to the preterite singular stem) with a new one (formed by adding -i to 
the preterite plural stem; OHG bari ‘you carried’ ~ Goth. bart, ON bart; Krahe 
1967: 100-3), the creation of an inflected infinitive (OHG beranne (dat.) ‘to 
bear’; Krahe 1967: 113) and the retention of reflexes of the irregular verbs 
PGmc. *dö- ‘do’, PWGmc. *gä- ‘go’ and *std-'* ‘stand’ (Krahe 1967: 137-40) 
constitute some of the most salient arguments from the realm of morphology. 


10.3.4 Intermediary Subgroupings 


It is beyond the scope of this chapter to delve into the further sub-branching of 
these three main sub-branches of Germanic, for which we refer to seminal 
works such as Nielsen (2000) instead. Rather, we shall discuss whether these 
three sub-branches arose simultaneously through one single ternary split or 
came into being through sequences of binary splits. We must therefore decide if 


! North Germanic also geminates k and g in front of j (PGmc. *legja- > ON liggja ‘lie’), but the 
West Germanic process applies to a much broader range of cases. 

12 Earlier in English (and Frisian) than in High and Low German (Krahe 1966: 59). 

P? PWGme. *stä- — *stö- (< PIE *steh;-) by analogy with *gä- (< PIE *g'eh/-). 
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any two branches are exclusive in sharing (preferably non-trivial) phonological 
and morphological innovations that cannot have arisen separately in each 
branch. 

Of the three possible combinations that could theoretically have existed, we 
may discard the East-West vs. North Germanic one. '^ Aside from the use of the 
derivational nominal suffix PGmc. *-Vssu- (Goth. -(in)assus, OHG -(n)issi), 
East and West Germanic share no linguistic innovations that are not also shared 
by North Germanic. The remaining linguistic traits shared only by East and 
West Germanic all constitute shared archaisms and are thus not diagnostic. 

The assumption of another of the constellations, that of an initial binary split 
into North-East Germanic and West Germanic, once gained some popularity 
among Germanicists (Maurer 1942; Schwarz 1951; Rósel 1962; Lehmann 1966: 
14—19; etc.) under the name Gotho-Norse theory. This split is supported by four 
(Schwarz 1951: 144-8) or five (Maurer 1952: 67-8) shared innovations, of 
which only one (Agee 2021: 337—8) may hold any diagnostic potential in a sub- 
branching discussion: the Verschärfung (i.e., occlusification) of PGmc. *-jj- and 
*-ww- > Goth. ddj, ON ggj and Goth. ggw, ON ggv, respectively, as opposed to 
the retention ofthese geminates in West Germanic where they surface as *-j- and 
*-w- (Goth. twaddje ‘two’ (gen.), ON tveggja - OHG zweio; Goth. triggws 
‘trustworthy’, ON tryggr ‘trustworthy, faithful’ ~ OHG triwi). However, as 
claimed by Rasmussen (1990/1999: 383-4), the Verschärfung process may 
actually have been initiated already in Proto-Germanic, and West Germanic 
may have undergone a subsequent “Entschärfung” process affecting both the 
reflexes of PIE *-;H- and *-uH- and examples such as OHG reia ‘female roe’, 
OE r@ge < PGmc. *raig-jö-. In addition, although seemingly non-trivial, the 
phonological development of Verschärfung finds approximate parallels in 
Faroese (Árnason 2011: 31-3) and in the transition from Latin to Romance 
(Agee 2021: 338). Thus, it is if not trivial, then at least not unparalleled. 

We now turn to the possibility of a North-West Germanic unity as opposed to 
East Germanic. More than twenty linguistic innovations appear to be shared by 
North and West Germanic (Agee 2021: 336). Some of these may be trivial, e.g. 
the lowering of PGmc. *c to *a (ON máni ‘moon’, OHG mano ~ Goth. mena), 
the development of word-final PGmc. *-6 (via ER -u) > -(Z" (neuter a-stem 
nom./acc.pl. ON -@", OHG -(y-u ~ Goth. -a), the rise of a-mutation (PGmc. 
*hurna- ‘horn’ > ON horn, OHG horn)? and perhaps even the rhotacism of 


14 For a contrasting view, see Kortlandt (2000). 

15 Crimean Gothic forms like reghen ‘rain’ and boga ‘arch; bow’ seem to suggest that parts of East 
Germanic partook in the process of a-mutation (Nielsen 1981: 296), thereby projecting this 
development back to Proto-Germanic times. The absence of short e and o in Gothic words 
whose North and West Germanic cognates have undergone a-mutation could then be due to the 
general Gothic merger of PGme. *i and *e into *i along with an unverifiable, but structurally 
expected merger of PGmc. *u and *o into *u. 
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PGmc. *z (> R) > r (PGmc. *maizan- ‘more’ > ON meiri, OE mara) (Kümmel 
2007: 80—1). On the other hand, we may not reasonably label as trivial the 
creation of a new deictic demonstrative pronoun by adding the enclitic particle *-si 
to the inherited demonstrative pronoun (Runic Danish sasi /sasi/ *this', OHG dese; 
Krahe 1967: 64—6) and the analogical replacement of reduplication in strong verbs 
by the secondary diphthong PGmc. *-ea- ~ *-ia- also known as *e? (ON let, OHG 
liaz ‘let? ~ Goth. larlot). The latter process in particular consists of so many 
subprocesses that it would be inconceivable to claim independent developments 
in North and West Germanic. In addition, although many of the remaining shared 
innovations may indeed be trivial, the sheer number of instances in itself suggests 
a period of North-West-Germanic unity. Finally, seeing that Early Runic partakes 
in all the innovations common to both North and West Germanic but none of those 
specific to East Germanic (Nielsen 2000: 77-202, 271-98, esp. 287-93), we may 
safely infer that, by the time ofthe earliest attestations of Early Runic in the second 
century CE, the East Germanic branch had split off from the Germanic dialect 
continuum, prior to its dissolution into North and West Germanic. 

On a final note, we shall review an alternative subgrouping scenario. As 
mentioned by Agee (2021: 344), there may still be some dialectal exchange in 
the years immediately following a split. If we choose to assign diagnostic value to 
the Verschärfung process, after all, and if the language varieties that would 
develop into the three Germanic sub-branches once coexisted in a common dialect 
continuum, nothing therefore prevents East and North Germanic from having 
shared innovations such as the Verschárfung at an even earlier point in time. In 
such a unified tree-wave model, the initial split of Proto-Germanic is defined by 
the first innovation (Le., the Verschärfung) not shared by all its descendants, 
because it did not reach the entire dialect continuum. Between this initial split 
and the final split, which defines the exit of a dialect from the dialect continuum 
and thus the establishment of a separate sub-branch, the numerous innovations 
common to North and West, but not East Germanic, could have taken place. 

Such an approach, which allows for both divergence and convergence, is 
also compatible with Agee's (2021) recent glottometric calculations, which 
operate with degrees of subgroupiness rather than absolute, clear-cut splits. He 
thus (Agee 2021: 335-8, 343) posits a high subgroupiness value for North- 
West Germanic (c = 20.04) as opposed to a low value for North-East Germanic 
(c = 0.032), indicating not only that North-West Germanic is indeed a tightly 
knit subgroup but also that the original dialect-continuum situation may have 
allowed for one shared North-East Germanic innovation. 

In sum, two credible models for the disintegration of Germanic present 
themselves. Either we must dismiss Verschärfung as a common North-East 
Germanic innovation and assume a North-West Germanic unity vis-à-vis East 
Germanic (as in Figure 10.1), or we must assume the existence of a Germanic 
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Figure 10.1 A tree model illustrating a binary split of Proto-Germanic into 
North-West vs. East Germanic 


NW Germanic innovations: 

* PGmc. *ae > *a 

* replacement of reduplication in strong 
verbs by a secondary diphthong 


* PGmc. *z > *R (> *r) 


* etc. 
ET — ———— W Germanic 
Germanic £---------- —— —— N Germanic 
E Germanic 
Verschárfung 


Figure 10.2 A unified tree-wave model of the Germanic dialect continuum 


dialect continuum in which North Germanic could have shared innovations 
with first East, then West Germanic prior to the final split (as in Figure 10.2).'° 


10.4 The Relationship of Germanic to the Other Branches 


Just as Germanic split into its sub-branches (Section 10.3), it has itself split off 
from Proto-Indo-European at a given point. Beyond the early divergence of 
Anatolian and Tocharian (Chapters 5 and 6), the relative order of the disinte- 
gration of Proto-Indo-European, including the sequence of the splits leading to 
Germanic, is difficult to establish, however. To solve the riddle, we must 
attempt to define with which other branches Germanic shares diagnostic linguis- 
tic traits, i.e., preferably non-trivial shared phonological and morphological 
innovations (see Section 2.3). 


1° The latter model assumes that an initial split within the Germanic dialect continuum (marked by 
the Verschärfung) is followed by the numerous North-West Germanic innovations (such as 
PGmc. *@ > *à and the analogical replacement of reduplication in strong verbs by a secondary 
diphthong) within other parts of the continuum and subsequently a final split into North, East 
and West Germanic. 
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One possibly high-node innovation that Germanic shares with several other so- 
called centum branches (Italic, Celtic, Hellenic and maybe Tocharian; see Krahe 
1966: 11-12; Fortson 2010: 58-9, 403) is the merger of Proto-Indo-European 
palatovelar and plain velar plosives into plain velars (PIE *(d)kmtóm ‘100 > 
PGmc. *hunda-, Gr. &xatov, Lat. centum). In contrast, the so-called satem 
branches (Indo-Iranian, Armenian, Balto-Slavic and maybe Albanian; see 
Fortson 2010: 59) merge Proto-Indo-European labiovelar and plain velar plosives 
into plain velars and develop the palatovelar plosives further into sibilants (PIE 
*(d)kmtóm ‘100’ > Ved. šatám, Av. satam, Lith. simtas). The geographical 
distribution of centum and satem branches indicates, however, that only the latter 
group was truly linguistically innovative. The branches of the peripheral areas thus 
merely reflect the original situation, with the exception of a trivial merger of 
palatovelars and plain velars that could easily have happened separately and 
independently in each branch and, at any rate, must have happened independently 
in Tocharian vis-a-vis the western centum branches. 

The centum-satem distinction aside, scholars have suggested close phylo- 
genetic relationships between Germanic and a range of other languages. The 
most frequent suggestions set up a Germano-Italo-Celtic unity (Meillet 1984: 
131-2; Porzig 1954: 213) or, less frequently, a Germano-Balto-Slavic unity 
(Schleicher 1853: 787; Stang 1972; also considered as one among several 
constellations by Meillet 1984: 132 and Porzig 1954: 214). Other scholars 
venture into larger groupings such as an “alteuropäisch” group consisting of 
Germanic, Celtic, Italic, Venetic, Illyrian, Baltic and possibly Slavic (Krahe 
1954: 48-63; 1962: 287-8; 1966: 13-14; modified by Schmid 1968), primarily 
based on hydronymic evidence; a “North-West Indo-European” group consist- 
ing of Italic, Celtic, Germanic and Balto-Slavic (Oettinger 1997, 1999, 2003); 
or a general “central” group consisting of Germanic, Balto-Slavic, Indo- 
Iranian, Armenian, Greek and probably Albanian (Ringe 2017: 6-7). 
However, these larger groupings are generally based on shared lexical (and 
derivational) rather than phonological and morphological innovations, which 
would constitute a more reliable means for establishing linguistic clades (see 
Section 2.3). In principle, chances are therefore high that these innovations 
result from post-split convergence. 

In Sections 10.4.1—7, we shall go through the branches with which Germanic 
Is exclusive in sharing specific phonological and morphological features. 


10.4.1 Italic 


Apart from a vast number of lexical innovations, some of which are also shared 
with Celtic (e.g. Goth. munps ‘mouth’ - Lat. mentum ‘cheek’, W mant ‘jaw’), 
Germanic shares a handful of innovative phonological and morphological 
features with Italic (Porzig 1954: 106-17, 123-7; Krahe 1966: 15-17, 20-1). 
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First among the shared Germano-Italic phonological innovations is the 
development of PIE *-TT- > *-ss- (e.g. pre-PGmc. *uid-(d'i)d"eh;-t > PGmc. 
*wisse(b) ‘he knew’; PIE *sed-tó- > Lat. sessus ‘seated, sitting’), which may 
also have been shared with Celtic (Meillet 1984: 57-61; Porzig 1954: 76-8). 
Second comes the back-vowel quality ofthe vowel developed in front of Proto- 
Indo-European syllabic liquids (PIE *r, */ > Lat. or, ol, Goth. ur, ul). 

The remaining relevant innovations are morphological. Germanic and Italic 
show some conformity as regards both the present-stem formation and the 
function of derived factitive verbs in PIE *-eh>-ie- (Germanic class II weak 
verbs ~ Latin 1st conjugation) and stative verbs in PIE *-eh;-ie- (Germanic class 
III weak verbs ~ part of the Latin 2nd conjugation, e.g. OHG dagen ‘be silent’ ~ 
Lat. tacere). Within numeral and adverbial word formation, Germanic and Italic 
share two innovative derivative suffixes with identical meanings: the creation of 
distributive numerals from multiplication adverbs by means of the suffix post- 
PIE *-no- (*duis-no- ‘double, of two times > ON tvennr ‘double’, Lat. bini ‘two 
by two’) and the creation of ablatival local adverbs in post-PIE *-tr-öd (Goth. 
utapro “from outside’; Osc. contrud ‘against’). 

To the extent that Venetic can be proved to constitute a separate Italic sub- 
branch rather than an independent Indo-European branch (Section 8.2), we note 
two innovations of Germanic shared with Venetic in this chapter (Porzig 1954: 
128; Krahe 1966: 17-18): the addition of post-PIE *g to the 1sg.acc. of the 
personal pronoun PIE *mé ‘me’ due to analogy with the Isg.nom. *eg- ‘T (e.g. 
Goth. mik ‘me’, Ven. meyo modelled after Goth. ik ‘I’, Ven. eyo) and the 
creation of an identity pronoun post-PIE *se/b^o- ‘self’? (Goth. silba, Ven. 
sselb-).'’ However, because these two Germano-Venetic innovations are not 
shared with all Italic subbranches, they must either be independent innovations 
in Germanic and Venetic or result from convergence between Germanic and 
Venetic after the initial breakup of Italic. 

In a similar vein, granted an Italo-Celtic cladistic node (Section 7.2), the non- 
participation of Celtic in the Germano-Italic innovations poses serious chal- 
lenges to the assumption of such a subbranch and suggests that these innovations 
rather result from secondary convergence after the breakup of Italo-Celtic. 


10.4.2 Celtic 


Germanic and Celtic had a long period of intensive contact (Porzig 1954: 118— 
27; Krahe 1966: 18-20; Bousquette & Salmons 2017: 390). Their high number 
of shared lexical innovations concentrated in certain semantic domains such as 


17 The existence of somewhat similar analogies in the personal pronouns in Anatolian (e.g. Hitt. 
nom. uk ‘I’, acc. ammuk *me) and in Greek (£ueye; see Whatmough 2015: 164) strengthens the 
suspicion that at least this innovation is trivial and may have happened independently in 
multiple branches (Porzig 1954: 191). 
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religion and warfare (Hyllested 2009: 117—18, 122) serves as solid evidence 
thereof. So do a number of indisputable Celtic loanwords in Germanic, e.g. PIE 
*hireó- ‘king’ > PCelt. *rig- > PGmc. *rik-. However, it is often difficult to 
decide whether a given Germano-Celticism is a shared innovation (or archa- 
ism) or reflects a loanword relationship in either direction as exemplified by 
PIE *hzreg-tu- > PCelt. *rextu-, PGmc. *rehtu- ‘justice’ (Schmidt 1984, 1986; 
Hyllested 2009: 107). 

Notwithstanding the quantity of these lexical isoglosses or their quality for 
reconstructing a period of Germano-Celtic neighbourhood and convergence, 
they remain lexical only. Apart from the uncertainties regarding the participa- 
tion of Celtic in the development of PIE *-TT- > *-ss- (Section 10.4.1), 
Germanic shares no exclusive phonological and morphological innovations 
with Celtic (Porzig 1954: 123; Hyllested 2009: 108-9). The evidence for 
a common Germano-Celtic branch is therefore scanty. 


10.4.3 Illyrian, Messapic and the Remaining Balkanic Branches 


As with both Italic and Celtic, the vast majority of shared innovations between 
Germanic, on the one hand, and Illyrian and Messapic, on the other, are lexical, 
e.g. Goth. biudans ‘king’ ~ Illyr. Teutana (personal name), but a couple of 
morphological innovations exists, as well (Porzig 1954: 127-31; Krahe 1966: 
18). Only with Illyrian and partially with Greek does Germanic share the general- 
isation of the ö-grade in the declension of feminine n-stems (Goth. nom.sg. 
tuggo /tungo/, gen.sg. tuggons /tungons/ ‘tongue’ ~ Illyr. nom.sg. Aplo, gen.sg. 
Aplonis (personal name)). The formation of possessive pronouns with the suffix 
*-no- attached to the locative of the personal pronouns is shared with Messapic 
(e.g. post-PIE *sueino- ‘his, her’ > Goth. seins, Mess. veinan (acc.)). 

Shared innovations between Germanic and the remaining Balkanic branches 
of Thracian, Albanian and Hellenic are limited to a handful of lexical corres- 
pondences, most of which are also shared with Illyrian (Porzig 1954: 138-9). 
The only exceptions are the trivial phonological development of PIE *sr > str in 
Germanic and Thracian-Albanian, which is, however, also shared with Illyrian, 
Brythonic, Slavic and partly Baltic (e.g. ON straumr ‘stream’ — Thrac. 
Lpbuov (river name), Illyr. Stravianae, Strevintia (place names), Lith. strove 
‘stream’; see Porzig 1954: 78-9; Krahe 1966: 22), and the equally trivial 
Germanic and Albanian merger of PIE *a and *o into *a, which may also be 
shared with (Balto-)Slavic (Meillet 1984: 54—6; see also Section 10.4.4). 


10.4.4 Balto-Slavic 


Most of the innovations shared between Germanic and Balto-Slavic are lexical 
(e.g. PGmc. *strela- ‘arrow’ — Lith. strélé ‘arrow, shoot’, OCS stréla ‘arrow’; 
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see esp. Stang 1972 and Nepokupny;j et al. 1989). Four major exceptions from 
the realm of phonology and morphology come to mind, though (Porzig 1954: 
139-47; Krahe 1966: 21—2). 

First, and most famously, Germanic and Balto-Slavic agree in forming the 
dative and instrumental plural with a suffix reflecting a PIE *-m- rather than the 
*-b/- found in the remaining Indo-European branches (PGmc. dat.pl. *-imiz as 
per the Germanic theonyms Aflims and Vatvims in Roman inscriptions, Lith. 
dat.pl. -ms, instr.pl. -mis, OCS dat.pl. -mv, instr.pl. -mi ~ Ved. dat.-abl.pl. 
-bhyah, instr.pl. -bhih, Lat. dat.-abl.pl. -bus, Gaul. dat.pl. -bo, Gr. instr.pl. -oı; 
see also Porzig 1954: 90-1). A recent study by Adams (2016: 19-22) indicates 
that Tocharian belongs to the m-group, its ablative ending Toch.B -mem 
reflecting pre-Toch. *-mons, i.e., the PIE dat.-abl.pl. *-mos with *n inserted 
analogically from the acc.pl. as in OPru. - mans. To Olander (2015: 269—70), the 
*m of Germanic and Balto-Slavic (and Tocharian) represents a phonological 
innovation of PIE *-b’i- > post-PIE *-m-. Other scholars, however, regard 
the m-cases as archaic rather than innovative and the *m/b" isogloss as 
a result of different levellings of an original distribution between dative/abla- 
tive plural in *m and instrumental plural in *5^ (Hirt 1895; Beekes 2011: 188; 
see also Section 15.4.1).!? 

Second, Germanic and Baltic agree on forming the numerals ‘11’ and ‘12’ in 
a highly non-trivial way by compounding the numerals ‘1’ and ‘2’ with the 
reflex of PIE *-/ik"vo- ‘left’ (Goth. ainlif 11’, twalif 12" ~ Lith. vienuolika ‘11’, 
dvylika *12^). The meaning has probably developed along the lines of ‘one left 
after counting to 10’ (11) and ‘two left after counting to 10’ (12). 

The third innovation is phonological. In both Germanic and Balto-Slavic, the 
inherited vowel qualities PIE *a and *o merge into *a. Since the Slavic 
development of *a > o is demonstrably late (Meillet 1984: 54), this Germano- 
Balto-Slavic merger would seem uncontroversial with the short vowels (e.g. 
PIE *poti- ‘master’ > Goth. (brup-)faps ‘bridegroom’, Lith. pats ‘husband, 
self’; see Meillet 1984: 54-6).!° However, the Baltic merger of *o and *a must 
postdate Winter's Law, since PIE *nog"- > PBalt. *nog- > Lith. núogas ‘naked’ 
(not *nog"- > tnag- > tndg- > Lith. Tnógas). The long vowels also require 
closer investigation. First, the merger of the long vowels only affects parts of 
the Germano-Balto-Slavic area, since Baltic keeps the reflexes of PIE *a and *o 


15 For a review of earlier literature on this matter, see Olander (2015: 267-8). 

1? This short-vowel merger also affects Albanian (Section 10.4.3). According to some scholars 
(e.g. Luraghi 1998: 174), Anatolian partakes, as well, but as Melchert (1993: 251) demonstrates, 
this merger did not affect Lycian, in which PIE *o merged with *e instead of *a. Thus, it must 
constitute a secondary shared innovation in Hittite, Palaic and Luwian. In a similar vein, the 
existence of Brugmann’s Law, which accounts for the different developments of short PIE 
*a and *o in open syllables in Indo-Iranian, witnesses that the identical merger in this branch 
must also have happened posterior to its separation from the remaining Indo-European 
branches. 
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apart (PIE *steh>- > Lith. stóti ‘stand up’ ~ PIE *népot- ‘grandson’ > OLith. 
nepuotis). Second, we must accept an intermediary stage of a merged pre- 
Proto-Germanic *à that later develops into PGmc. *o as posited in 
Section 10.2.1. No matter how many branches the mergers of short and 
long PIE *a and *o cover, one fact remains: both mergers represent trivial 
processes of phonological change and may just as easily have taken place 
independently in each branch. 

Fourth and last, Germanic, Slavic and to some extent Baltic share the equally 
trivial insertion of *r into the cluster PIE *sr with Thracian, Illyrian and 
Brythonic (Section 10.4.3). 

As a parallel to the case of shared Germano-Italic innovations affecting only 
the Venetic part of Italic, or only the Italic part of Italo-Celtic (Section 10.4.1), 
the fact that Germanic shares innovations with only parts of the Balto-Slavic 
unity weakens the assumption of an early Germano-Balto-Slavic cladistic 
node. Being the sole non-trivial innovation shared by all Germanic and Balto- 
Slavic (and Tocharian?) sub-branches, only the oblique cases in PIE *-m- may 
potentially support such an assumption, though with some major potential 
reservations (Section 15.4.1). The remaining non-lexical innovations could 
have either happened independently in each branch or arisen due to conver- 
gence at a period when Germanic, Baltic and Slavic had all developed into 
individual branches. Thus, it is not surprising that Pronk (Section 15.4.1) 
dismisses the idea of such a common Germano-Balto-Slavic node. 


10.4.5 Armenian 


The only innovation uniting Armenian and Germanic is their treatment of the 
Proto-Indo-European system of plosives. Both branches have undergone 
‘consonant shifts’ by changing the articulatory manner of the plosives in 
similar ways (Meillet 1984: 89-96; Porzig 1954: 80-2; see also 
Section 10.2.1 for an account of the Germanic developments). The voiced 
aspirates (PIE *b^ d^ g" g^ g") developed into unaspirated voiced plosives 
and/or fricatives), the voiced unaspirated plosives (PIE *b d é g g") into 
unvoiced plosives and, finally, the unvoiced unaspirated plosives (PIE *p t k 
k k") into unvoiced aspirates. These unvoiced aspirates are predominantly 
retained as such in Armenian (PIE *t k k/k» > Arm. t' c' k') but have 
developed further into fricatives in Germanic (PIE *p t k/k k» > PGmc. 
*f b h hw) and partially in Armenian, too (PIE *p > Arm. h). 

Although these developments are indeed substantial, they may still have 
occurred independently in the two branches in question. As Meillet (1984: 
93—6) mentions, such consonant shifts are trivial innovations with parallels in 
several other language families worldwide, e.g. Aramaic and some Bantu 
dialects, and Porzig (1954: 81—2) questions whether the developments in 
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Germanic and Armenian are really as parallel as they seem to be at first 
glance. 


10.4.6 | Tocharian 


The apparent participation of Tocharian in the group of languages that 
select m-variants of the dative/ablative and instrumental plural of case endings 
(Adams 2016: 19-22; see Section 10.4.4 for a detailed treatment) may position 
it firmly together with Germanic and Balto-Slavic. Additional parallels 
between Germanic and Tocharian are limited to lexical elements (Porzig 
1954: 97-8, 182-7). 


10.4.7 Anatolian 


Apart from allegedly both grouping together with Italic and Tocharian in 
expanding the function of the reflexes of the interrogative pronoun PIE *k"o-/ 
k"i- to include the function of a relative pronoun (Puhvel 1994: 318), Anatolian 
and Germanic only share lexical isoglosses. Even if some among these 
isoglosses are indeed striking and highly specialised (e.g. ON herdar ‘shoulder 
blades’ ~ Hitt. kakkartani ‘shoulder blade’; Goth. ulbandus ‘camel’ — Hitt. 
huwalpant- ‘hunchback’; Puhvel 1994: 323-4; Melchert 2016: 298-300), they 
remain lexical and thus less fit for cladistic purposes than phonological and 
morphological aspects. 


10.5 The Position of Germanic 


As demonstrated in Section 10.4, no branch offers itself as an obvious candi- 
date for sharing acommon node with Germanic in the Indo-European cladistic 
tree. We could tentatively choose to see the *-m-variant ofthe secondary cases 
(Section 10.4.4) or the collocation of the Germanic 2nd and 3rd classes of weak 
verbs with the Latin 1st and 2nd conjugation (Section 10.4.1) as evidence in 
favour of a cladistic partnership with Balto-Slavic and Tocharian or with Italic, 
respectively. However, these pieces of evidence obviously point in different 
directions, and as for the Balto-Slavic connection, other pieces of evidence 
show shared innovations with Baltic only, not with Slavic, which indicates 
a period of contact and joint development between Germanic and Balto-Slavic 


20 The evidence for Germanic partaking in the innovation of expanding the function of the reflexes 
of the interrogative pronoun is meagre, to say the least. The Germanic languages form their 
primary relative pronouns in three different ways. East Germanic applies the demonstrative 
pronoun followed by an enclitic particle -7; North Germanic, an indeclinable particle er or es; 
and West Germanic, the demonstrative pronoun alone (Krahe 1967: 68-9; see also Porzig 
1954: 191). 
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languages during a relatively late time period and, in any event, after the initial 
breakup of Balto-Slavic. The same goes for the Germano-Italic innovations 
that are not also shared with Celtic and thus must postdate the initial breakup of 
Italo-Celtic. Two linguistic arguments may, however, be presented in favour of 
a relatively early split of Germanic. 


10.5.1 Nominal Ablaut 


A well-known, seemingly archaic feature of the Germanic branch is its preser- 
vation of Proto-Indo-European nominal ablaut, especially in the heteroclitics. 
Here we may recall cases such as PGmc. nom. *soel (Goth. sauil, ON sol), obl. 
*sunn- (Goth. dat. sunnin, ON sunna) ‘sun’ < PIE *sehz-ul, gen. *sh;-u(é)n-s 
and the somewhat parallel PGmc. nom. *for (cf. Goth. fon, OHG fuir, fiur 
‘fire’), obl. *fun- (Goth. gen. funins) < PIE *péh;-ur, *ph;-u(é)n-s. With the 
exception of Anatolian, such nominal ablaut patterns are far less well preserved in 
the other branches. Although vestiges of these patterns exist throughout the 
family (Lith. vandud ~ Latv. udens ‘water’ < PIE *u(o)d-r/n- and Lat. iecur, 
gen. iocineris ‘liver’ < PIE *ie/ok"-r/n-), Germanic appears rather conservative in 
this respect. 

Additional indications for such inherited productivity in Germanic come 
from a related nominal category, the n-stems. There is ample evidence for 
inherited ablaut patterns in this category, e.g. PIE *kréit-0, obl. *krit-n- 
‘fever’ (OHG nom. rido ~ dat. riten), PIE *meh5k-o, obl. *mh>k-n- ‘poppy’ 
(OSw. val-moghe ~ OHG maho, mago); see further MW cryd < PIE *krito- 
and Gr. jujkov < PIE *meh>k-on-. In other n-stems, however, the ablaut 
appears to be decidedly secondary. A possibly secondary full grade pre- 
sents itself in, e.g., Nw. dial. jase ‘hare’ (« ON *hjasi < PGmc. *hesan-) as 
opposed to pan-Gmc. *hasan- — *hazan- (OHG haso, OE hara) and, 
outside Germanic, Ved. sdsa-, Lat. cánus (< *kasno-) (< PIE *kas-). 
Secondary zero grades must in turn be assumed for PGmc. *mapo, obl. 
*mutt- *maggot, moth" (Goth. mapa — ON motti) and *rapo, obl. *rutt- 
‘rat’ (OHG rato - MLG rotte), apparently from pre-PGmc. *mot-n- and 
*(H)rot-n- (Kroonen 2011: 218—23). The Indo-European nominal ablaut is 
thus not merely preserved in the Germanic n-stems, but seems to have 
remained productive, a feature long lost in most other branches. 


10.5.2 The Preterite-Presents 


A second archaic characteristic of Germanic is the retention of the verbal 
category that is generally held to somehow correspond to the Anatolian 
hi-presents: the Germanic preterite-presents (Section 10.2.2). Examples 
include 
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* PGmc. *waita-witume *know' > Goth. wait-witum, ON veit-vitum 

e PGmc. *maga-magume ‘can’ > Goth. mag, ON má-megum 

* PGmc. *aiha-aigume “own, have’ > Goth. aih-aigum, ON d-eigum 

* PGmc. *kanna-kunnume ‘can’ > Goth. kann-kunnum, ON kann—kunnum 

* PGmc. *mana-munume ‘think’ > Goth. man, ON man-munum 

* PGmc. *skala-skulume ‘shall, must’ > Goth. skal, ON skal-skulum 

The reconstruction of this category for Proto-Indo-European is debated. 
Opinions differ as to whether it was a conjugational type of its own or 
rather originally identical with the perfect (see Kloekhorst 2018 for a 
discussion). 

Regarding the lexical distribution of this class, some of the verbs have paral- 
lels in Indo-European languages other than Germanic, e.g. PGmc. *magan- ~ 
OCS mogo (< PIE *mog’- ‘be able’); *munan- ~ Gk. uéuova ‘has in mind’ (< PIE 
*(me-)mon-); PGmc. *aigan- ~ Ved. ise ‘avail over’ (< PIE *(h»ji-)h»ik-; see 
Hansen 2015); PGmc. *ogan- ~ Olr. ágathar (< PIE *hze-h>og"- ‘fear’), yet 
others are isolated to Germanic, even though they contain more widely attested 
verbal roots, e.g. PGmc. *kunnan- (< PIE *gneh3- ‘know’),”! *lisan- (PIE < 
*Jeis- ‘track’), *ga-nahan- (< PIE *Hnek- ‘reach’) and *skulan- (< PIE *skel- 
‘owe’). It is tempting to conclude, as a result, that the Germanic preterite- 
presents, whatever their ultimate origin, were still a productive verbal category 
when Germanic split off from Proto-Indo-European. This is more reminiscent of 
the situation in Hittite, where the hi-conjugation is still a fully functioning verbal 
category, than ofthe situation in the remaining Indo-European branches, where it 
has largely disappeared and can only be traced through isolated remnants. 


10.5.3 Conclusion 


Exactly how early Germanic split off remains exceedingly difficult to 
determine. While Germanic is generally a highly innovative Indo- 
European sub-branch and lost many of the Proto-Indo-European features 
still present in Vedic and Greek, the sustained productivity of (1) nominal 
ablaut and (2) the preterite-presents can be taken as "living fossils”.”” 
Perhaps then, these are potential indications that Germanic split off from 
PIE at a relatively early stage, as these features are generally lost in the 
non-Anatolian branches. Based on this interpretation, we may surmise that 
Germanic broke off from Proto-Indo-European after Anatolian and just 


before or after Tocharian. 


?! The double n of *kann- ~ *kunn- suggests that it was innovated on the basis of the neh>-present 
PGme. *kunnö- < PIE *£nh;-neh;-, which is well-attested outside Germanic (Toch.A knänat, 
Ved. jänäti, etc.) and clearly old. 

22 The multiply renewed productivity of the root-noun declension type in Germanic (Hansen 
2017) may constitute a third "living fossil" of this type. 
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11 Greek 


Lucien van Beek 


11.1 Introduction 


Greek is one of the earliest attested languages of the IE family, starting with 
Mycenaean in the fourteenth-twelfth century BCE (on the dating of the tablets, 
see Driessen 2008). From the so-called Dark Ages (twelfth-ninth century 
BCE), we have only one written piece of evidence in Greek (Cypriot 
O-pe-le-ta-u, perhaps mid-tenth century). Starting in the eighth century BCE, 
alphabetic inscriptions appear in various different dialects and from all corners 
of the Greek world; moreover, literary Greek starts with the Homeric epics. 

From the Mycenaean period onwards, Greek was spoken in the southern- 
most parts of the Balkan peninsula (Epirus, Thessaly, and further south) and on 
the islands in the Aegean (Crete, Cyclades) and Ionian seas. Processes of 
migration and colonization starting as early as the Mycenaean period brought 
Greek across the Aegean to the Western and Southern Asia Minor coastline, to 
Cyprus and probably the Levant, and from the eighth century onwards to Sicily, 
the Italic peninsula, the Rhone delta, Libya, Egypt, and the Black Sea region. 

Mycenaean Greek was written in a syllabic script (Linear B). With the 
destruction of the palaces, Linear B went out of use, but on Cyprus a related 
syllabary survived, most inscriptions dating to the eighth-fourth century BCE. 
All other first-millennium varieties of Greek were written in different local 
forms of the Greek alphabet, which was adopted from the Phoenician abjad 
during the Dark Ages (the exact date(s) and place(s) of adoption are still 
debated). ! 

Ancient Greek is attested in many (at least thirty) different dialects: from the 
beginning of the Dark Ages until the Classical period, almost every polis had its 
own local (epichoric) variety and local alphabet, reflecting the political frag- 
mentation of Greece. Broadly speaking, the following dialects are attested in 
the inscriptional record (cf. Buck 1955), divided into four main groups (see 
Section 11.3): 


This chapter was made possible by a VENI grant from NWO (Netherlands Organization for 
Scientific Research) for the project Unraveling Homer 5 language. 
' An eleventh-century date has recently been proposed (Waal 2018). 
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* Arcado-Cypriot: Arcadian (Central Peloponnese) and Cypriot (Cyprus); 
Mycenaean is closely related to both dialects 

e [onic-Attic: Attic (Attica), Western Ionic (Euboea, Oropos), Central Ionic 
(Cycladic islands), Eastern Ionic (Chios and the Asia Minor coastline from 
Smyrna to Halicarnassus) 

e Aeolic: Thessalian (Thessaly, with five regional varieties), Boeotian 
(Boeotia), and Lesbian/Aeolic proper (Lesbos and the Western Asia Minor 
coastline north of Smyrna) 

* West Greek, usually subdivided into Doric and North-West Greek dialects 
(cf. Mendez Dosuna 2007b): 

* Doric dialects were spoken around the Saronic Gulf (Megarian, 
Corinthian, Eastern Argolic), on the Peloponnese (Western Argolic, 
Laconian, Messenian) on the southern Aegean islands (Cretan, 
Theran, Dodecanese (Cos, Rhodes)), and the Ionian islands (including 
Corcyrean). 

* North-West Greek dialects were spoken North of the Gulf of Corinth: 
Locrian, Phocian, Delphic, Acarnanian, Aetolian, Epirotic.” 

* The dialect of Elis has many peculiar features; that of Achaea is marginally 
attested. 

* Various West Greek dialects were transported to colonies in Magna Graecia, 
where they developed local characteristics (Syracusan from Corinthian, 
Tarentine and Heraclean from Laconian, etc.); Cyrenaean developed from 
Theran. 

Pamphylian (around present-day Antalya, southern coast of Asia Minor) is 

fragmentarily attested and difficult to classify (Brixhe 1976; 2013). 

A linguistic description of most dialects, however, is hampered in various 
ways (for a detailed methodological discussion see García Ramón 201 7). First, 
there are large chronological and geographic gaps in the often fragmentary 
attestations of most dialects. In the archaic period, longer inscriptions (e.g. the 
Gortyn Law Code) are scarce, and there are not any longer dialect texts from 
Messenia, Achaea, and large parts the North-Western realm. Secondly, the 
range of subjects covered in prose inscriptions is narrow (mostly treaties and 
regulations), and the language is often formulaic or standardized. This may also 
hold for Mycenaean, where the relative lack of variation between different find 
spots is suggestive of a bureaucratic register. Third, a tendency toward koi- 
neization starts relatively early in most areas, and the tendency to actively 
promote local dialect peculiarities in official inscriptions led to hyper-dialectal 
forms. Finally, even with the dialects that are known well (Classical Attic and, 
to some extent, Eastern Ionic), it must be taken into account that literary texts 
do not always reflect the actual linguistic situation. 


? Many of these dialects are only fragmentarily attested. 
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Indeed, utilizing forms of literary Greek poses problems ofa different nature. 
Most archaic forms of poetry are not in local dialect, but in genre-dependent 
(epic, lyric, drama, etc.) linguistic forms. Specific features became established 
as markers of certain genres (e.g. feminine participles in -orga in choral lyric, 
probably reflecting the prestige of Lesbian poetry). Moreover, all genres share 
a considerable body of archaic grammatical and lexical features that were 
absent from most vernaculars. These features may derive from a traditional 
poetic language (a “poetic Koine”) with roots in the late second millennium. 

For these reasons, it is often difficult to assign features attested in literary 
texts to a specific dialect. Thus, alongside contemporary Lesbian forms, the 
language of Sappho and Alcaeus contains common poetic forms, borrowings 
from Ionic and from epic, and probably also artificial forms.” Epic Greek has 
a general Ionic phonological veneer and contains many specifically Ionic 
grammatical and lexical features. However, as the traditional language of 
verse-composition in hexameters, it also contains large numbers of archaic 
words, morphemes, and phrases. Some of these can be assigned to dialects 
other than Ionic (Aeolic, probably also Mycenaean), but often dialect assign- 
ment is difficult. Finally, a considerable number of typical Homeric forms are 
artificial creations (for an overview, see Hackstein 2010). 


11.2 Evidence for the Greek Branch 


This section aims to present all innovative developments (including significant 
choices between alternatives) that set Proto-Greek apart from other branches." 
In combination with the virtual absence of demonstrably old divergences 
between the Greek dialects, this enumeration shows that Proto-Greek existed 
as a real prehistoric linguistic entity, thus disproving Garrett's provocative 
claim that there are hardly any “demonstrable and uniquely Proto-Greek 
innovations in phonology and inflectional morphology" (2006: 141). 

First, some remarks concerning relative chronology. The Mycenaean evidence 
allows us to assign certain changes to the period after the adoption of Linear B (e.g. 
*pi > pt, or the lenition of initial yod). It is not always easy, however, to distinguish 
between Proto-Greek innovations and later shared Common Greek developments. 
An often-cited example is the introduction of *-wor- as the perfect participle suffix. 
This innovation was formerly reconstructed for Proto-Greek because it occurs in 
all first-millennium dialects (except for Aeolic, which uses the suffix *-ont-), but 
Mycenaean shows that Proto-Greek retained *-woh-. However, although the 
Proto-Greek status of some of the individual changes below may be doubted, it 


3 An extensive treatment is Bowie 1981. 

^ Fora similar but less extensive list, see Clackson 2007. 

5 Exaggerated doubts concerning our ability to reconstruct Proto-Greek also surface in Risch's 
work (e.g. Risch 1963). 
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is clear that they all took place between PIE and attested Greek; hence, the 
majority will have taken place before the split into North and South Greek. 


11.2.1 Phonological Innovations Shared by All Greek Dialects 


1. Specific laryngeal vocalizations, including 
* word-initial before consonant plus vowel (*HCV-): triple reflex e, a, o^ 
e word-initial before resonant plus consonant (*HRC-): triple reflex e, a, o 
* between two consonants (*CHC): triple reflex e, a, o; this probably 

included word-initial *RHC-, cf. uoxpos ‘long’ < *mh>k-ro- beside 
UMKIOTOG, UNKOG 

e *CRHC > PGr. /CR&C/, /CRaC/, /CRöC/’ 

e *CRHV > PGr. /CaRV/ (with coloring of V by the laryngeal)* 

the development of *CiHC and *CuHC remains disputed: @dudc¢ ‘spirit’ 

< *d'iihmó- ‘smoke’ (Lat. fumus, Ved. dhiima-, also Hitt. tuhhuwai-, all 

*smoke") is a certain example of a long-vocalic reflex. On the other hand, 

Ved. jivati, jivá- and Lat. vivo, vivus seem to imply a vocalization *CiöC 

< *Cih3C for the cognate formations (oo ‘live’, (wos ‘alive’ 

e *-ih,> -ia at word end (nom.sg. of the fem. motion suffix), also *-ih; > -ie 
(only in dual *h3ek"-ih; > Hom. doce ‘eyes’); it is debated whether this 
change was phonetically regular or analogical. 

2. The double reflex of *i-, which merges with *di- (plus *gi-, *g"i-) in one 
subset of lexemes that have correspondences with *;- in other IE languages 
(e.g. Cé@ ‘boil’, Myc. ze-so-me-no; Cvyov ‘yoke’ and Cebyvyju ‘connect’, 
Myc. ze-u-ke-si), but was retained and developed into A- in another subset 
(relative pron. óc, Myc. jo-, o- beside Ved. yah; zap ‘liver’ beside Lat. 
iecur). The distribution between both reflexes, which is the same in all 
Greek dialects (including Mycenaean), represents an exclusive common 
innovation of Proto-Greek. The exact conditioning factor, probably the 


* The divergent initial reflex of Doric pixat ‘twenty’ ~ Classical eíxoc: < *h wi-Hkmt-i (with 
problematic o < *m) is unexplained, but this does not suffice to show that the laryngeals were 
retained until after PGr. 

The divergent form zpóàroc vs. West Greek mpatoc of the ordinal ‘first’ must reflect a contracted 
superlative PGr. *pro-ato- (cf. Cowgill 1970: 123, 148). There is some evidence for a disyllabic 
reflex of *CRHC: tpaytc ‘rough’ < *d'rh»g'-u-, Opdaow ‘stir’ < *d'r(e)h2g"-, but tapaoow ‘id.’ 
< *d'rh2g'-. It is often claimed that the disyllabic treatment occurred only when the liquid was 
accented (e.g. Rix 1992: 73), but in my view this is uncertain. Another plausible possibility is that 
the disyllabic reflex was regular before /CC/, while the long vowel reflex occurred before /CV/ 
(van Beek 2021a). 

*CRh3V- may have yielded PGr. /CoRV/, with rounding of the anaptyctic vowel caused by the 
following labio-laryngeal (cf. uoAeiv ‘come’, zopeiv ‘give’ < *mlh3-e/o-, *prh3-e/o-). The 
Lesbian form yoAaıcı ‘are slack’ (Alcaeus) corresponding to Classical yao: (ala) is not 
sufficient evidence for positing a distinct reflex for Aeolic (pace Peters 1980: 28). 


x 


œ 
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presence or absence of an initial laryngeal (cf. García Ramón 1999), is still 
disputed (cf. van Beek 2019). 

3. Loss of word-final stops, including stop clusters (voc. &va ‘lord’ 
< *wanakt). 

4. Restrictions on allowed stop clusters, including developments of “thorn 
clusters" (two consecutive stops are allowed only if the second stop is 
dental, e.g. xt or zc; while ta*, tx*, xz*, z* are disallowed). This situation 
is pre-Mycenaean in view of e.g. e-qi-ti-wo-e /ek*"thiwohes/ perf.ptc. 
‘perished’ from PIE *d"g"ej-. 

5. Development of voiceless aspirates /t^ k^ p^ k"^/ from the PIE “mediae 
aspiratae", already completed in Mycenaean (cf. te-o /t"e(h)os/ ‘god’ from 
PIE *d";s-ó-; but contrast Section 11.4 on Macedonian and Phrygian). 

6. Development of a circumflex accent: the pitch on long vowels may fall on 
the first mora (circumflex accent) or on the second mora (acute accent). 
The distinction was probably phonologized when early contractions took 
place, not long after the loss of intervocalic laryngeals (e.g. ung gen.sg. 
< *-éh»-os vs. tıum nom.sg. < *-éh;). 

7. The Law of Limitation: the pitch accent can be assigned only to the last 
four morae of a prosodic word, and only to the last three morae if the final 
syllable is accentually long. 

8. Lenition *s > h in different positions: (a) word-initially before vowels or 
R (= any liquid, nasal, or glide); (b) between vowels and in the intervocalic 
clusters *-sR- and *-Rs- (probable exception: -rs- and -/s- were not lenited 
if the directly preceding syllabic nucleus carried the accent). 

9. The syllabic nasals yielded a nasal vowel [ä] or [5] in Proto-Greek. This 
normally merged with /a/ in all dialects, but in some dialects we also find 
/o/ under specific, yet still uncertain, conditions (perhaps in a labial 
environment).? 

10. Cowgill's Law, i.e. *o > u in certain environments involving labials and 
nasals. In various words this raising occurs in all Greek dialects, e.g. voc 
‘night’ < *nok"t-. However, not all dialects show this raising in the same 
words (cf. Ion.-Att. övoua vs. Dor. Aeol. óvuua), and the conditions are 
still in part uncertain; see Vine 1999, 

The laryngeal changes under (1) are mostly specific to Greek, but some are 
shared with Phrygian (Section 11.4.2). This may also hold for developments (3) 
and (4), which are equally attested in Phrygian, although the Greek loss of final 
stops is difficult to date (the Linear B syllabary does not make it possible to 
determine whether they were present in Mycenaean or not; contrast also Phryg. 
voc. -vanak with Gr. &va ‘Lord!’). The Law of Limitation is difficult to date as 
we have no evidence for accentuation in most dialects. 


? Discussion of the evidence in Thompson 1996-7: 316-20. 
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A development *CRHV- > *CaRV- is also found in Italic and Celtic, but it is 
probably independent, as in those branches a-coloring of anaptyctic schwa 1s 
unsurprising. The vocalization in (9) may be independent of that in Indo- 
Iranian, as the Greek outcomes /a/, /o/ postdate the Graeco-Phrygian stage 
(*n > Phr. an). 

Certain developments involving clusters of stop plus glide are also likely to 
be Proto-Greek: 

11. Intervocalic */"; merges with PGr. *ts (Ion.-Att. u£coc < *med'ios, töoog 
< *totios; Arc. uécoc; Myc. to-so; Boeot. uérrocg; most other dialects 
u&ooog; older Cretan may preserve /ts/). In productive formations, */"j 
was restored; its reflex merged with that of x) iin most dialects but not in 
Mycenaean. 


11.2.2 Morphological Innovations: Verbal Stem Formation and Endings 


12. Development of an aorist in -0r-, in addition to the inagentive aorist in -7- 
(which reflects “stative?” *-eh,-). The exact origin and genesis of this 
formation are still disputed. 

13. Creation of a «-perfect, where -x- was originally found only in the indic. 
sg.'° Greek productively extended this morpheme (perhaps originally an 
aorist marker, cf. unreduplicated Lat. feci, iecr beside Ednxa, Enka), first to 
intransitive perfects of long-vocalic roots (e.g. zégüxa, Eornka), later also 
to transitive perfects (e.g. A4£A0xa) and other stem types. 

14. Replacement of the perf.act.3pl. ending *-er with *-nti, reflected as -atı in 
WGr. dialects and as -&o: in Arcadian (Buck 1955: 112). This ending was later 
adapted to *-anti (> Att.-Ion. -&or) in most dialects. 

15. The “alpha-thematic” sigmatic aorist paradigm, which was based on the 
Isg. after the word-final change *-m > -a; the 3sg. received the thematic 
ending -e after the loss of *-t. 

16. Replacement of the stative endings by the middle endings 3sg. -to, 3pl. -nto. 
17. Creation of new secondary middle endings lsg. *-man (unique to Greek) 
and 2sg. *-so (as in other branches, including Italic and Germanic). 

18. Creation of primary middle endings in -i. 

19. Development of a medio-passive perfect stem (see Section 11.4.2). 

20. Creation of an active pluperfect with a suffix *-e- and alpha-thematic 
endings (Hom. enenoidea).” 


10 


Cf. Att. éotac; tedveoog beside Eornka,tedvnka. 
11 


It cannot be excluded, however, that the PIE stative endings lsg. *-h2, 2sg. *-tÀ;o were 
originally distinct from middle *-mA;, 2sg. *-so. Cf. Kortlandt 1981. 

However, the antiquity and spread of this formation are difficult to assess. The irregular Homeric 
pluperfect 7ön ‘knew’ is certainly old; it has been compared with PCelt. *wedr < *ueid-eh;- by 
Schrijver (1999). 
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2]. Certain productive reduplication patterns: 
a. default Ce- (perfect stem), Ci- (present stem) for roots with simple onsets 
b. "Attic reduplication” in roots starting with a vowel (e.g. 4v0- — 
&AnAvO-) 
c. full reduplication in roots of the structure /VC-/ (e.g. åp- — apnp-) 
d. /e-/ in the perfect of roots with complex onsets (e.g. perf.mid. &evyuaı). 
22. The infinitive endings: 
a. thematic *-e-hen (e.g. Myc. e-ke-e /ek^ehen/ ‘have’, Att. -gıv, etc.) 
b. athematic *-men, *-menai (Lesb. Euuevoı “be’) and *-hen (Myc. te-re-ja-e 
/teleiähen/ ‘fulfill’), *-henai (Att. i&vaı ‘go’)!* 
c. *-(t)sai (s-aorist) 
d. *-st"ai (middle). 
23. Creation of a denominative factitive class in PGr. -o- (type nów), see 
Tucker (1990). 


11.2.3 Morphological Innovations: The Cases, Nominal Endings, 
and Nominal Stem Formation 


24. The PGr. dat.-loc.pl. ending -si (for PIE *-su) arose by analogical intro- 
duction of -i from the loc.sg. ending, probably aided by instr.pl. *-b'i.'* 

25. Case syncretism: Proto-Greek merged the dative and locative plural of all 
declensions (PGr. *-oisi, -asi, -si). 

26. Greek has various clitics and suffixes marking spatial relations: *-de 
cliticized to the accusative of direction, e.g. o/kóvóe ‘home’ (already 
Mycenaean), *-t"i (locative, e.g. oixo@1 ‘at home"), *-t'en (ablative, e.g. 
navrödev ‘from all sides’), but also local *-t'n > -0a as well as *-t'e after 
local adverbs; at least *-/"i and *-t"en originated in adverbial pronouns (cf. 
zó0i ‘where’, zó0ev “whence’) and were innovations of Proto-Greek. 

Proto-Greek had more innovations (e.g. the introduction of nom.pl. endings -oi, 

-ai in the first and second declension, the extension of the 3rd decl. n.pl. ending 

-à € *-h to thematic stems replacing the reflex of *-ehz, or the generalization of 

the 3rd decl. gen.sg. ending -os to the exclusion of *-es). However, since most 

of them are shared with various different other branches and fairly trivial 
developments, they cannot be utilized for purposes of subgrouping. 
In nominal stem formation, innovations include: 

27. The suffixes -éu- (masculine persons or professions), -ad- and -id- (denoting 

appurtenance). 


13 For the relation between *-hen(ai) and *-men(ai), see van Beek in press. The suffixes *-men- 
and *-hen- could both be extended with -ai under certain specific conditions. In Lesbian, -uevaı 
occurs only with monosyllabic stems containing a short vowel. 

1^ Pace Garrett (2006: 140), this is not “a trivial adaption". 
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28. The extended form in -t (Classical -war-, -at-) of the suffixes *-mn-, 
*-r/-n- in neuter nouns. 

29. The extended form of the comparative suffix *-is-on- > -iwv (unattested in 
Myc., though). 

30. The use of *-tero- as a comparative suffix with gradable adjectives. 

31. The superlative suffix *-(t)mto- > -(t)atoc, replacing *-(t)mHo- (cf. Lat. 
intimus ‘innermost’, Ved. ántama- ‘nearest’). 


11.2.4 | Pronouns 


32. Acc.pl. of the personal pronouns in -me (generalized orthotonic forms 
*ns-mé, *us-mé). 

33. Reshaping ofthe nom.pl. *uei, *ius ofthe personal pronouns after the acc.: 
*nsm-es, *usm-es (cf. Dor. ġués, óuéc; Aeol. dues, Duuec). 

34. The dative of personal pronouns in -i(n): clitic Ion.-Att. 7uıv, orthotonic 
Dor. quiv, Lesb. &uu(v) (contrast Ved. dat. asmé < *-me-i). 

35. Creation of a stem form gge- beside ogi(v) ‘to them(selves)’, probably 
a clitic form of PIE *se-bfei. 

36. Grammaticalization of anaphoric/demonstrative odtoc, adrn, cobro (inter- 
mediate deixis) from *so (h2)u plus *to- (the first part corresponds to Ved. 
sa u and the nom.sg. pronoun Plr. *hau (OAv. huud /hau/, OPers. hauv), 
Ved. asáu). 

37. Creation of the demonstrative xeivoc / Exeivog (distal deixis). 

38. Reflexive avtdc ‘same; self’, grammaticalized from *hzeu ‘again’ plus 
anaphoric-demonstrative *to-. 

39. Creation of a negation oöx(i), ov, probably from *(ne) ... *h;oiu k"id 
(Cowgill 1960). 


11.2.5 The Lexicon and Remaining Innovations 


Lexical innovations are more difficult to utilize for the purpose of subgrouping, 

but they may complement the picture gained from the phonological and 

morphological innovations. Some typical lexical innovations of Greek are (a 

full list would be much longer): 

40. The verb ‘wish, choose’ has a root PGr. *g"e/- or *g"ol- instead of PIE 
*uelh;- (BobAouot, Arc., Eub. BoAouaı, Thess. BéAAouot, WGr. óeíAouot, etc.). 

4]. The verb ‘die’ has the root PGr. *t'nd-, *t^ana-. 

42. The word for ‘guest, stranger’ is PGr. *ksenwo-. 

A large amount of the Greek lexicon was borrowed from the indigenous 

language(s) of the Hellenic peninsula. Beekes (2014) views this as one single 

non-Indo-European language which he calls “Pre-Greek”, but while the Greek 

lexicon indeed has an important non-Indo-European element, it is difficult to 
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determine when, where, and from how many different varieties this material 
was taken. The forms zópyoc ‘fortification’ < *b"^(o)rg^- and róufloc ‘grave’ 
< *q^(o)mb"- presuppose an Indo-European donor language. 


11.3 The Internal Structure of Greek 


The Ancients distinguished four main dialects of Greek: Attic, Ionic, Doric, 
and Aeolic. As they recognized that Attic and Ionic were very closely related, 
a basic three-way distinction is implied (also reflected in the three Hellenic 
tribes and their ancestors A@poc, Zoödog, and Aiodoc in Hesiod fr. 9 M-W). 
However, ancient scholarship was interested mainly in literary languages, not 
in spoken dialects (see Tribulato 2019). 

After the decipherment of the Cypriot syllabary, however, scholars quickly 
realized that Arcadian and Cypriot were much more closely related to each 
other than to Thessalian and Boeotian, and that the Ancients used “Aeolic” as 
a catchall term for anything that was not Ionic, Attic, or Doric. Even so, the 
threefold distinction (and the inclusion of Arcado-Cypriot among the Aeolic 
dialects) was largely maintained. ^ In fact, the theory that Ionians, Aeolians, 
and Dorians existed as distinct ethnic and linguistic groups as early as 
2000 BCE, and that they migrated into the Hellenic peninsula in three 
chronologically distinct waves (Kretschmer's Wellentheorie), held sway for a 
long time. 

This picture was changed radically by two landmark studies, Porzig 1954 
and Risch 1955; see also Risch 1963. Both scholars independently showed that 
Arcado-Cypriot was a distinct dialect group with close genetic ties to Ionic- 
Attic. Moreover, both argued that Asia Minor Aeolic (Lesbian) had been 
influenced substantially by neighboring Ionic dialects, and that East 
Thessalian is the most conservative Aeolic dialect. In addition, Risch made 
a plausible argument for reconstructing a first split into North Greek and South 
Greek (comprising Arcado-Cypriot and lonic-Attic) in the early second 
millennium. ^ It is now widely accepted that South Greek is characterized by 
the following exclusive innovations: 

* assibilation *4j > /si/ (e.g. 3sg. Óíóco1) 
e simplification PGr. *ts and *ss > s, also after short vowels (e.g. ucoc) 


'S For a good summary of earlier works on Greek dialect classification and subgrouping, see 
Morpurgo Davies 1992. 

16 Many scholars still use the terms West Greek and East Greek (cf. Porzig 1954) instead of Risch’s 
North Greek and South Greek, respectively. In order to avoid confusion, I stick to Risch's 
terminology and reserve “West Greek” for the dialect group that comprises all Doric and 
Northwest Greek dialects. 

17 According to Risch, *ts > s fed the assibilation *ti > si, but the antiquity of (*ts >) ss > s cannot 
be proven because Linear B does not write geminates (Myc. to-so corresponding to Ion.-Att. 
TO00g). 
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e athematic infinitives *-(h)en, *-(h)enai (Dor. and Aeol. -uev, -uevar)'* 
e correlative temporal adverbs in /-te/, e.g. tote ‘then’ (Aeol. -za, Dor. -xa) 
* temporal conjunction ei (Dor. Aeol. ai), but Cypr. has e- 
e nom.pl. zor, tai of the demonstrative replaced by oi, ai (probably also 
Aeolic). 
There are few (if any) old innovations that are characteristic for all North Greek 
dialects. The best candidate is the e-vocalism of the present stem ‘want’ (Thess. 
PE)Aoyaı, WGr. óciAoyat, etc.), but it remains uncertain whether this is a shared 
innovation rather than an archaism. It is likely that certain distinctive Aeolic 
innovations occurred between the separation of South Greek and the twelfth 
century (Section 11.3.7). 
Following Risch, we may distinguish three periods: 
a. Mycenaean period (relative stability, probably increasing local differentiation) 
b. Dark Ages (high mobility; rapid language change, convergence) 
c. ninth century BCE until the Classical period (the dialects occupy their histor- 
ical locations; colonization movements; increasing local differentiation). 
Various linguistic innovations can be assigned to one of these periods, based on 
(1) relative chronology, (2) linguistic geography, and (3) their presence or 
absence in Mycenaean. "° 


11.3.1 Mycenaean 


Mycenaean is clearly a South Greek dialect, as evidenced by the assibilation 
of voiceless dental stops (e.g. di-do-si /didonsi/ ‘they give’), the conjunction 
o-te ‘when’, and an athematic infinitive in /-hen/ (te-re-ja-e /teleiähen/ 
‘fulfill’). 

Apart from this, however, the position of Mycenaean relative to the first- 
millennium dialects is less clear.” Arcadian and Cypriot are closely related 
dialects, but it must be borne in mind that most exclusive Arcado-Cypriot 
innovations are not attested in Linear B (see below). An exception in this 
respect might be Myc. pe-i /sptehi/, an innovation which arose by adding the 
dat.pl. ending to acc. *sp/^e, replacing the older form og: (Ion., Hom.). This 
form is continued in Arcadian ogeotv (SEG 37, 470.15) with -hi replaced by 
-si(n), and ogeig (IG V 2, 6.10) with added -s after contraction.”' 


But cf. van Beek in press, arguing that -wev was preserved longer also in South Greek, and that 
Proto-Greek had both *-hen and *-men; the choice depended on whether the paradigm had 
ablaut or not. 

As for linguistic geography, features shared exclusively by non-contiguous dialects are plaus- 
ibly analyzed as shared innovations stemming from an earlier period when these dialects were in 
direct contact. 

See Cowgill 1966 for an overview of earlier literature on the position of Mycenaean. 

See the discussion in Morpurgo Davies 1992: 429-30. 


20 
21 
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Risch (e.g. 1955) claimed that there were no noticeable differences between 
Mycenaean and Proto-Ionic in the fourteenth or thirteenth century BCE. For 
this, he has been widely criticized (see Cowgill 1966). It is difficult to disprove 
that all characteristic innovations of Ionic-Attic (beyond general South Greek 
features) took place after the Mycenaean period, but Mycenaean has also 
undergone changes that are not paralleled in any first millennium dialect (cf. 
García Ramón 2016: 242-3): 

* raising e > i before labial sounds 

* palatalization of /sk/, as evidenced by the orthographic variation a-ke-ti-ri-ja 
~ a-ze-ti-ri-ja /(*)asketriai/ (Méndez Dosuna 1993) 

e neuter nouns in -mo(t-) (e.g. pe-mo ‘seed’) instead of -ma(t-). 

Several scholars have viewed these features as reflecting dialectal or sociolinguistic 

differences among Mycenaean scribes (“normal” vs. “special” Mycenaean, in the 

terms introduced by Risch 1966; monographic discussion in Hajnal 1997), but the 

evidence is far from clear, and it has alternatively been explained by Thompson 

(1996—7) as orthographic variation reflecting language change in progress. 


11.3.2 Arcado-Cypriot 


Arcadian and Cypriot are closely related South Greek dialects, but are they 

closer to each other than to Mycenaean or Proto-Ionic? Morpurgo Davies 

(1992) has shown that Proto-Arcado-Cypriot can be sensibly reconstructed. 

The following features are relevant:^? 

* raising *en-, on- > in-, un- in the preverbs/prepositions £v, dv (= Att. ava) 

e word-final -o > -u and diphthongization in the gen.sg. -ào > Arc. -av, Cypr. /-au/ 

e analogical nom.sg. -yç of nouns in -eúç (after acc. -nv) 

e demonstrative óvo (= Ion.-Att. óóz) 

e amv and ¿č governing the dative, not the genitive 

e preverb/preposition /pos/ (Arc. zoc, Cypr. po-se) instead of Ionic-Attic zpóc 

e generalization of the by-form /kas/ (Arc. xag, Cypr. ka-se) of the conjunc- 
tion Kat. 

With the exception of some Pamphylian forms, the above isoglosses are 

exclusive.” Interestingly, most of the common features of Arcado-Cypriot 


7? Here might also be mentioned the desyllabification of /i/ before vowels and the subsequent 
palatalization of velars, e.g. su-za /sütfa/ < *sükia < *sükia ‘fig tree’, but note that desyllabifica- 
tion of /i/ also occurs in Aeolic dialects. 

Cf. García Ramón (2010: 227—9; 2017: 78—9). This list excludes lexical choices, which mostly 
concern words otherwise preserved only in epic Greek, e.g. alca ‘lot; fate’. Unlike García 
Ramón, I exclude the palatalization of *ki- (cf. Arc. vic, Cypr. si-se, Att. vic) because it is notan 
exclusive isogloss with Arcadian, and the regular reflex of *r (Arc. has op, but the evidence from 
Cypriot is somewhat ambiguous). 

Another salient feature of Arcado-Cypriot, the athematic inflection of contract verbs, is shared 
with Aeolic (Thessalian, Lesbian). It is unclear to what extent this represents a shared 


2 


P 


2. 


E 
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seem to be post-Mycenaean innovations: this is certain for nom.sg. -yç beside 
Myc. -e-u and for the syntax of dav and é¢. As for the raising of en- and of 
word-final -o, these phenomena are not attested in Mycenaean spelling. 
Finally, note that Myc. has disyllabic po-si corresponding to /pos/, and that 
it may reflect either *poti or *prti. 

Various features in which Arcadian and Cypriot diverge may be plausibly 
assigned to the period after 1200. Thus, the labial reflex of *k"e in Cypr. 
pe-i-se-i ‘will pay’ (Att. teiger) is the default outcome of a labiovelar, while 
the Arc. reflex /te/ can be part of a development shared with the continuum of 
West Greek dialects and Ionic-Attic. 

As we saw, Mycenaean has a few innovations not present in Arcadian and 
Cypriot, but the three dialects also share the exclusive innovation /sptehi/ for 
/sp^i/. Thus, both first millennium dialects reflect vernaculars spoken in the 
Peloponnese that diverged slightly from the administrative language written in 
Linear B but were closely related to it. The common innovations of Arcado- 
Cypriot may have come into being in the course of the thirteenth or twelfth 
century BCE, before the migration to Cyprus. 


11.3.3  lonic-Attic 


Proto-Ionic can be reconstructed fairly well. Exclusive shared innovations 

between Attic and all Ionic dialects include: 

e fronting *a > /æ:/ 

* Quantitative Metathesis (there were two rounds: one preceding and another 
following intervocalic w-loss) 

* nom. and acc.pl. nueis, nuéac and óueic, úuéaç replacing PGr. forms in *-es, 
-e (Lesb. dec, Aue) 

e dat.pl. orthotonic zjuiv, ouiv (replacing -i(n), cf. Lesb. auf) 

e athematic imperf.3pl. (and pluperfect) -cav, from the sigmatic aorist, 
replacing *-(h)an 

e 3sg. *es ‘was’ (etymologically expected from *e-/1;es-t, and attested in WGr. 
ns) was replaced by 7v (originally 3pl. ‘were’); the latter was replaced as a 
3pl. form by 7oav 

e certain typical contractions (Buck 1955: 37-43), notably *ae > Ion.-Att. à 
(Dor. 7). 

Proto-Ionic probably underwent most of these exclusive innovations before the 

Ionian migrations to Asia Minor, which are conventionally dated to the mid- 

eleventh century.” A number of further innovations are isoglosses, due to 


innovation. The athematic 3pl. secondary ending /-an/ (Arc. &deav, Cypr. ka-te-ti-ja-ne) is also 
found in Boeotian and is reconstructible for Proto-Ionic. 
?5 [n addition, Proto-Ionic underwent an early loss of word-initial and intervocalic *w. 
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convergence, with neighboring West Greek dialects; they may have spread in 

the twelfth or eleventh century: 

* word-internal *r > ap (pa in epic Greek or analogical, van Beek 2013; 2022)^6 

* the 1st compensatory lengthening and isovocalic contractions, leading to 
a seven-vowel system 

* the 2nd compensatory lengthening 

* dental outcomes of labiovelars before front vowels (cf. also Arc.) 

* thematic inflection of contract verbs 

e mid.3sg. -tar — *-toi (also Aeolic) 

e impv.act.3pl. -vzov < -vro + v (also in Delphic, Cretan, Theran; contrast -vrw 
in most other dialects, Lesb. -vzov). 

It remains uncertain as to what extent Proto-Ionic had already innovated with 

respect to Mycenaean-like dialects in the thirteenth century. The apparently 

clear distinction in the reflexes of *r (Ionic-Attic ap, Mycenaean spelled with 

the o-series) is difficult to use as evidence because a retention of *r in 

Mycenaean cannot be excluded, and the same might be true of Proto-Ionic at 

this date (van Beek 2013; 2022). The outcome of secondary *//?j was Proto- 

Ionic *zs but is spelled with the s-series in Mycenaean (e.g. pe-de-we-sa ‘with 

feet’), which may represent either /ts/ (Crespo 1985) or /ss/ (Viredaz 1993); in 

the latter case, Mycenaean would have innovated with respect to Proto-Ionic. 
With the migrations across the Aegean, various local varieties of Ionic 

developed. The main division is between Western dialects (subdivided into 

Attic and Western Ionic) and Eastern dialects (subdivided into Central and 

Eastern Ionic); it includes the following characteristic innovations: 

e *ts > oo (Eastern and Central Ionic), tz (Attic, Western Ionic) 

* loss of *w after R, s with compensatory lengthening (Eastern Ionic), or 
without compensatory lengthening (Attic, Western Ionic) 

e *rs > pp (Attic, Western Ionic) 

e reversion *@. > d after i, e, r (Attic, perhaps Western Ionic) 

* loss of /- (Eastern Ionic) 

* rhoticism, i.e. s > r between vowels and word-finally (Western Ionic). 

Some of these developments are shared with neighboring dialects (Boeotian, 

Lesbian). 


11.3.4 The Unity of Aeolic and the Position of Proto-Aeolic 


The need to reconstruct Proto-Aeolic has been forcefully defended by García 
Ramón (2010), reacting to the superficial treatment by Parker (2008).?" García 


26 It is uncertain whether ap or pa was the regular reflex in mainland West Greek dialects, but a as 
an anaptyctic vowel is certain. 

27 On this issue, and on the internal subgrouping of Aeolic, see also the unpublished dissertation by 
Scarborough (2016). 
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Ramón argues that the Aeolic dialects were linked in the twelfth century BCE 

not only by shared innovations but also by a number of common selections 

among different alternatives and common retentions.** Clear shared innov- 

ations exclusive to all three Aeolic dialects are 

e * p 7 po 

e labial reflexes of the labiovelars before front vowels^? 

* pı > pe (Lesb. Aauorperw for class. Anuoxpitov, Thess. xpevveuev for class. 
kpiveiv, Boeot. zpézeóóa ‘table’ from *tripedza, cf. Hsch. zpineödav) 

e the sigmatic aorist in -oo- of stems in a vowel, analogically extended from 
stems in -s- 

e the perfect participle in -ovc-. 

The change *r > po has gained significance in the light of my investigation of 

the place ofthe anaptyctic vowel (van Beek 2013; 2022): the regular reflex is po 

in Aeolic dialects, but not in Mycenaean (which has either *r or op) or Arcadian 

(op). This makes *r > po an exclusive innovation of all three Aeolic dialects, 

which may be dated to the late Mycenaean period or before. 
The following features might be added: 

* 3rd declension dative plural in -eoor ' 

e feminine ia ‘one’ (Lesb., Thess., Boeot.) vs. uia (all other dialects)? 

e thematic inf. -guev (Thess. and Boeot.), but only if Lesb. -yv is due to Ionic 
influence 

e temporal adverbs in -ta (Lesb. and Thess.), if Boeot. -xa is from West 
Greek.”* 

According to Risch (1963), more fully elaborated by Garcia Ramön (1975), 

there is no hard evidence for an Aeolic subgroup in the Mycenaean era. García 

Ramön dates the above innovations to the twelfth or even eleventh century. 


30 


?* I agree with García Ramón that common choices between alternatives are also significant for 
subgrouping, but I disagree with his emphasis on the significance of common retentions (such as 
the patronymic adj. in -oc, which is also preserved in Mycenaean but replaced by the gen. of the 
father's name in WGr. and Ion.-Att.). 

Exceptions are the clitics te < *k"e and vic < *k"is in all three Aeolic dialects; the Perrhaebian 
form xıc may have been generalized from negated *ou- Kis. 

For a more extensive list of features, see Méndez Dosuna 2007a. I have left aside the desylla- 
bification *CRiV > *CRiV, which leads to partly different results in Thess., Boeot., and Lesb., 
but may still reflect an early common tendency ofthe three dialects (García Ramón 2010: 223-4 
and 225). Hajnal (2007: 151—2) sees evidence for this change in Mycenaean and views it as an 
isogloss with early Aeolic. 

Although -eooı also occurs in some subtypes of 3rd declension stems in various West Greek 
dialects, it was the only current 3rd declension ending (excepting s-stems, where both -&001 and 
-€€oo1 occur) in all three Aeolic dialects. García Ramón's view (1975: 83-4) that it arose after 
the split-up of Proto-Aeolic seems unlikely to me for reasons I will discuss elsewhere. 

The reconstruction ofthe PGr. form is debated: does ia reflect a reduced form *smia- > *sia- that 
was leveled from the oblique cases, or does it reflect a different pronominal stem? This issue 
does not, however, change the significance of the presence of ia in all Aeolic dialects (García 
Ramón 2010: 225-6). 

33 See García Ramón 2010: 232 and 2017: 43-4 on Thess. rota and oxke (< *hota- ke). 
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However, a number of typical Aeolic innovations probably pre-dated the 
turmoil of the Dark Ages. For instance, since the Aeolic dialects were not 
affected by the palatalization processes of labiovelars found in West Greek, 
Ionic-Attic, and Arcadian, the development to labials is best seen as an earlier 
innovation of Proto-Aeolic. It is more likely that the differences between West 
Greek and Aeolic developed gradually over the course of the Mycenaean 
period. 

Lesbian also has features not shared by Thessalian and Boeotian, including * 
e assibilation *fi > oi 
e preverb/preposition zpóc (against moti) 

e o-vocalism in BoAlouaı ‘want’ (against Thess. ptc. fjeAAouevoc, Boeot. 

Peikouevos) 

e giç, éc (< *ens) + acc. ‘into’ (against êv + acc.) 

e thematic infinitives in -yv (against -euev) 

e athematic infinitives in -v and -uevaı (against -uev). 

These divergences are usually accounted for by assuming that the Lesbian 
features arose in contact with Ionic (Risch 1955). Indeed, the preverbs 
zpóc and eig, ç might be borrowings from Ionic, and fóAAoua: might be 
a crossover between earlier P&/AAouaı and Ionic PovAouaı. The evidence for 
*ti > gi, however, is problematic: Lesbian seems to have undergone 
a sound change, but this would be unexpected as the result of contact 
since first-millennium Ionic did tolerate /ti/ again. We may therefore 
envisage a different scenario in which the second-millennium precursor 
of Lesbian took part in at least one archaic South Greek innovation (*ti > 
oı) and also in the exclusive isoglosses just listed with Thessalian and Boeotian, 
without taking part in later exclusive South Greek innovations.” This would be 
compatible, for instance, with a localization of pre-Lesbian on the southeastern 
fringes of Thessaly, in what was certainly part of the Mycenaean realm, or even 
in Boeotia. In other words, Lesbian would be a bridge dialect between South 
Greek and Aeolic (thus already Chadwick 1956: 48). 

As for Boeotian, this dialect did not undergo all the innovations shared by 
Thessalian and Lesbian. For this reason, García Ramón 1975 assumes that its 
speakers migrated into Boeotia in the mid-twelfth century, and that Thessalo- 
Lesbian underwent a couple of further innovations, including the characteristic 
Aeolic gemination (in contrast to compensatory lengthening of the vowel in 
most other dialects), before the Lesbian migration. 


?* The athematic infinitive in -vevar is often included in the evidence for influence of Ionic on 
Lesbian: it is supposed to be a contamination of Aeol. -uev and Ion. -vaı. However, -wevar may 
be an archaism inherited from Proto-Greek (García Ramón 2009) or an inner-Lesbian extension 
of *-men. See van Beek in press. 

35 Similarly, but different in the details, Finkelberg 2017. For the athematic infinitives, see van 
Beek in press. 
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11.3.5 Doric and North West Greek Dialects as Varieties of West Greek 


West Greek dialects are characterized mainly by the absence of specific innov- 
ations of South Greek (e.g. assibilation of *ti) and/or Aeolic (e.g. thematic inf. 
in -guev), i.e. by retained archaisms, but they also underwent a small number of 
common innovations.”° These pan-West Greek innovations must be projected 
back into the Mycenaean period: if they were later isoglosses it would be 
difficult to understand why Attic and Arcadian do not share them. 

Innovations include: 

* the so-called “Doric future" in -oéo (also found in all NWGr. dialects), which 
arose through contamination of -oœ and the “Attic” future in -éœ 

e aorist and future stem in -C of all verbs in -G@ 

* the numeral zéropec ‘4’, with analogical -7- for *-tu- (perhaps after *kvetrto-). 

* lexical: e.g. iapdc instead of iepóc or ipóc, Aprauıs instead of Apreuic (cf. also 
Myc. gen. A-ti-mi-to). 

Choices between alternatives include: 

e /a/ < *n in the numerals Fikarı ‘20° (also in Thess. ıxarı, Boeot. Fixati, 
without prothetic vowel) and -xazio: ‘-hundred’ 

e generalization of the ancient primary Ipl. ending -uec (SGr. and Aeol. -uev) 

e temporal adverbs in -xa (also in Boeotian); contrast SGr. -re, Thess. and 
Lesb. -ta 

* the anaphoric pronoun viv (contrast Myc. /min/, Ion. uv) 

* modal particle «a, elided x’ (also in Boeotian; Thess. Cypr. xe, Lesb. xev, 
Arc. and Ion.-Att. äv) 

e ordinals zpäros ‘first’ (also in Boeotian) vs. Att. zp@ztoc, both from *pro-atos 
(Cowgill 1970: 123 and 148), £fóeuoc ‘seventh’ vs. Att. £fóouoc, and the 
cardinal tetp@xovta ‘forty’ vs. Att. rerrtapakovra. 

Interestingly, West Greek dialects appear to diverge in their treatment of *r (van 

Beek 2013; 2022). Cretan dialects have a regular anaptyxis before /r/, and 

probably a conditioned reflex: ap normally, but op after labials. On the other 

hand, the dialects of Elis and Corinth (and its colony Syracuse) seem to have 
the regular anaptyctic vowel after /r/ (e.g. &rpaóec for Erapöes “you farted’ in 
the Syracusan poet Sophron). This would have the important consequence that 

Proto-West Greek retained *r until Dorians settled on the Peloponnese and 

Crete in the twelfth-eleventh century BCE. 

Since the nineteenth century, West Greek has been subdivided into "severe 
Doric” (characterized by a system with five long vowels) and “mild Doric” (seven 
long vowels, with /e:/ and /o:/ from contractions and the Ist compensatory 
lengthening, as in Ionic-Attic). In addition to this, Bartonék (1972) pointed out 
the existence of “middle Doric” (seven long vowels, with /e:/ and /o:/ from 


36 Cf. Méndez Dosuna 2007b for a complete list including more examples, but with some different 
choices. 
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contractions, but /s:/ and /9:/ from the 1st compensatory lengthening). According to 
Bartonék the severe Doric dialects form a distinct subgroup of West Greek, but most 
scholars now suppose that the various different long vowel systems of West Greek 
dialects took their shape in the late second / early first millennium BCE and kept 
developing afterwards (Méndez Dosuna 1985; Ruijgh 2007). Indeed, Elean attests 
yet another different system with six long vowels and its own peculiar history. 

Doric and the North-Western group are best seen as deriving from a more or 
less undifferentiated West Greek. Except for the creation of *ens + acc. ‘into’, 
which is shared with Ionic-Attic, there are no common innovations of the 
Doric dialects to the exclusion of NWGr. (Méndez Dosuna 1985; see Méndez 
Dosuna 2007b: 445 for an overview of relevant features). Moreover, due to 
the lacunary attestation of many North-Western dialects, it remains uncertain 
whether they formed a distinct branch of West Greek, or rather a convergence 
area. 


11.3.6 The Status of Pamphylian 


Even the few data we have for Pamphylian make it clear that the dialect cannot 
be assigned to one of the groups discussed above: it has, for instance, the 
athematic infinitive a/pJıevaı (South Greek), dative plural in -&001 (Aeolic, 
NWGr.), hoka = öte, hıapog = iepóg (West Greek only), and pırazı /wikati/ 
‘twenty’ (West Greek or Aeolic). From this, it has been concluded that 
Pamphylian is a mixed dialect, possibly reflecting an original Mycenaean 
settlement with a superposition of later West Greek and Aeolic strata (Brixhe 
1976: 149; 2013: 189-203). 


11.3.7 Branching and Dating: Tentative Conclusions 


In sum, the most likely scenario is as follows (see the tentative tree in Figure 11.1). 
In the first centuries of the second millennium, Proto-Greek was undifferentiated, 
although there was no doubt some variation, as well as affinities with other Balkan 
languages." Around 1700, South Greek-speaking tribes penetrated into Boeotia, 
Attica, and the Peloponnese, while North Greek was spoken roughly in Thessaly, 
parts of Central Greece, and further North and West (up to Epirus, and perhaps also 
Macedonia). During the early Mycenaean period, South Greek diverged by the 
assibilation of *zi, the simplification of word-internal *ts and *ss, and a number of 
morphological innovations. 


37 Scholars often date the immigration into the Peloponnese to the end of the third millennium, but 
I would prefer a later date coinciding with the beginning of Late Helladic, in the seventeenth 
century BCE (cf. Hajnal 2005). This would fit the linguistic data best, as reconstructible 
differences between South Greek and North Greek in the late Mycenaean period are relatively 
small. 
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Figure 11.1 The Greek dialects 


At some point, probably still in the Mycenaean period, Proto-Aeolic devel- 
oped as a result of changes such as *r > po, labial reflexes of all remaining 
labiovelars, and the creation of 3rd decl. dat.pl. -eoo1. Proto-Aeolic can be 
reconstructed if the South Greek features of Lesbian and the West Greek 
features of Boeotian can be ascribed to contact with Ionic and West Greek, 
respectively, in the late Dark Ages. Alternatively, the precursors of Lesbian and 
Boeotian in the Mycenaean period may have been bridge dialects linking 
Thessalian with South Greek and West Greek, respectively. 

In the thirteenth-twelfth century BCE, then, there were (at least) three larger 
dialect areas: South Greek on the Peloponnese and in Attica and Boeotia; 
Aeolic in Thessaly, and West Greek in North-Western regions. Moreover, in 
the same period Proto-Ionic also started to diverge from Mycenaean-like 
dialects (Proto-Arcado-Cypriot). We are in the dark, however, about the dia- 
lects spoken in Central Greece, and not all dialects spoken in this period need 
have survived. 

The traditional concept of Dorian migrations in the twelfth and eleventh 
centuries is still the best way to explain the isolated position of Arcadian and 
the specific institutions shared by various Dorian states. Many defining char- 
acteristics of the first-millennium dialects (including isoglosses shared between 
Proto-Ionic and West Greek) took shape in the Dark Ages through convergent 
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developments; this means that the situation in the second millennium may have 
been quite different (cf. the discussion about the position of Aeolic), and many 
specific details cannot be recovered. 


11.4 The Relationship of Greek to the Other Branches 


11.4.1 | Greek and Macedonian 


Macedonian is known from various Greek-like personal names, some glosses 
in Hesychius, and probably from a curse tablet found at Pella, containing an 
unknown form of Greek resembling NWGr. dialects (SEG 43.434, c. 380—350 
BCE, Hatzopoulos 2007). To this might be added an oracular consultation on 
a lead tablet found at Dodona (Méndez Dosuna 2012: 144—5). The Pella curse 
tablet shares some typical features with NWGr. dialects: apocope in the 
preverb xar-, dat. pron. &uiv vs. &uoi, and a temporal adverb in -xa. On the 
other hand, scholars have traditionally viewed Macedonian as a separate 
language closely related to Thracian and Phrygian on account of reflexes of 
the “voiced aspirates” written «B 5 y> (e.g. BovAouaya = dvAAouáyn). 
However, this does not explain e.g. the reflex of *g”- in the name KefiaZ1oc 
(cf. Gr. xepaAn): if Macedonian had a Thraco-Phrygian-like development, 
one would expect */’eßalıog. Moreover, since there is also evidence that 
volceless stops were voiced between vowels and in contact with sonorants 
(e.g. diyaia = Att. dixaia, ApeféAaoc = Att. Tpepédewc), it is proposed (cf. 
Méndez Dosuna 2012) that <B 6 y> may represent both voiced fricatives 
(from *p^ t^ k") and normal voiced stops (*p tk); finally, KeßaAıog presupposes 
that Macedonian took part in Grassmann’s Law. If this is correct, Macedonian 
started off as a NWGr. dialect which subsequently underwent its proper 
Lautverschiebung in the stops. Caution is obviously necessary in view of 
the limited evidence. 


11.4.2 Greek and Phrygian 


Greek is clearly more closely related to Phrygian than to any of the main 
branches of Indo-European: there are shared phonological, morphological 
and lexical innovations." This close correspondence is all the more remarkable 
given the fragmentary attestation of Phrygian. The view that Phrygian and 
Armenian are especially closely related, already expressed in ancient authors, 
is not based on compelling evidence (cf. Obrador-Cursach 2019: 240-2; contra 
Lamberterie 2013). 


38 See Neumann 1988, Lamberterie 2013 and Obrador-Cursach 2019 on Graeco-Phrygian, and 
Ligorio & Lubotsky 2018 for a recent encyclopedic treatment of Phrygian. 
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Phrygian shares phonological innovations such as the following with 

Greek: 

* a threefold reflex of PIE *CRHC is proven by MPhr. yAovpeog ‘golden’ (cf. 
yAobpea: ypboea. Ppbyec «kai? yAovpóc: xpvoos, Hsch. y 659), correspond- 
ing to Greek yAopóc ‘bay, pale; green’ < PIE *g"lh;-rö-; this development is 
not shared with any other Indo-European language 

e a threefold reflex of word-initial *HC-, cf. NPhr. avap < *h>ner (Gr. dvnp), 
OPhr. onoman (Gr. óvoua) ? 

e triple reflex of PIE *CHC: Phr. -uevoç < *-mh nos, as in Greek 

e lenition of prevocalic *s, word-initially (NPhr. eyedov = Gr. éyéo0o < *seg!-) 
and after a vowel (NPhr. dewg = Gr. Oeoïç < *d'h;s-ó-), as well as in *sw- 

e loss of word-final occlusives: 3sg. impv. -tov = Gr. -tœ < *-tod. 

Note that Phrygian is a centum language: cf. OPhr. egeseti, NPhr. eyedov < 

*seg"-e/o-, MPhr. yAovpeos < PIE *g"lh;-rö- plus *-eios. Other phonological 

innovations led to differences with Greek, but none of them has to be early: 

* the labiovelars were merged with the pure velars and palato-velars: NPhr. 
Kvalkav = Gr. yovaika 

* the PIE voiced obstruents developed into voiceless stops (Lubotsky 2004): 
acc. Trav = Znv(a), gen. Tiog = Zióc, dat./instr. Tı(e) = Ait, Af, as well as acc. 
kvoukav ‘wife’ = Gr. yovaixa. 

The following morphological isoglosses are relevant: 

* OPhr. (probably 3sg. opt.) kakoioy, kakuioy, probably a counterpart to 
Greek xaxóc ‘maltreat? with preserved intervocalic yod; both the type 
of factitive formation and the lexeme are exclusive to Phrygian and 
Greek 

e OPhr. avtos, an exclusive isogloss with Gr. aùtóç ‘self’, cf. (38) above; the 
combination OPhr. venavtun, with secondary -n-, neatly matches Gr. éavtov 
‘himself’ < *swe auton 

e the suffix *-eu- in Greek masculine nouns in -eóc seems to be matched by 
(apparently thematized) OPhr. -avo- 

e NPhr. 3sg. eyedov, probably a middle imperative, is paralleled by Gr. -o0c 
(possibly a common innovation, Ligorio & Lubotsky 2018: 1828) 

e the middle perfect ptc. in -uevog < *-mh;nos (formed in an identical way in 
Greek). 

Phrygian preserves several morphological archaisms that Proto-Greek lost. The 

3pl. perfect ending *-er is probably continued in NPhr. daxapev ‘they estab- 

lished’ (*-er plus *-ent). On the whole, however, the Phrygian verb displays 
many innovations, even if most details are still unclear. 


?? The Armenian reflexes of these words (ayr ‘husband’, anun ‘name’) also have “prothetic 
vowels"; this is often interpreted as a common development of “Balkanindogermanisch” (cf. 
Hajnal 2003), but the laryngeals developed differently in Armenian in other environments, 
whereas there are no discernable differences between Greek and Phrygian. 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


11 Greek 193 


Lexically, the following items are important: 

* Phryg. knaikan ‘woman, wife’ beside Gr. yvvoixa, reflecting PIE *g"en-/;, 
*o"n-eh>- with an additional suffix -ik- (or -i-k-: cf. Armenian pl. kanai-k' 
‘women’ without the k-suffix) 

e Gr. óvoua. ‘name’ and Phryg. onoman ‘id.’ with a zero grade root (also 
attested elsewhere, but contrast Latin nömen, Vedic naman-, Armenian 
anun < *o/anómn) ^ 

* Phr. deme (instr.pl.) and Gr. 6eóc reflect PIE *d"h ‚s-0- ‘god’, while most other 
languages have a reflex of *deiuo- 

* NPhr. oyoóav, if reflecting an adverb *ups-o-d'n ‘above’, forms a near- 
precise match with Gr. öwödev ‘on high; from above’ (Lubotsky 1993). 

Notwithstanding the fragmentary attestation of Macedonian and Phrygian, it 

seems likely that their ancestors formed a linguistic unity with (pre-)Proto- 

Greek in the late third and early second millennium BCE, presumably some- 

where on the southern Balkans (Macedonia, Thracia), before Hellenes penetrated 

into Thessaly and further south. The relationship to other Balkan languages 
remains quite uncertain. Hajnal (2003) collects some possible evidence for 
prehistoric contacts between Ancient Balkan languages, including the appurten- 
ance suffix -eio- (attested in Greek, probably in Phrygian kubeleya, possibly in 

Venetic and Messapic, but not elsewhere) and the innovative dat.-loc. ending -si 

(probably found in Albanian -sh),*' but there is not enough evidence for drawing 

solid conclusions. 


11.4.3 Greek and Armenian 


The possibility ofa closer relation between early forms of Greek and Armenian 
has attracted scholarly attention since the works of Meillet and Pedersen. In 
more recent times, a genealogical connection has been pleaded for by Olsen & 
Thorse (Chapter 12) and Lamberterie (1997; 2013). Skepticism has been 
voiced by Clackson (1994) and, recently, Kim (2018). Indeed, there are no 
phonological isoglosses that must be distinctive innovations shared exclusively 
by Greek and Armenian, and what are probably the earliest phonological 
innovations of Armenian are generally not matched by Greek counterparts. 
Furthermore, shared morphological innovations cannot be demonstrated 
(Clackson 1994: 60—87). 

Having said this, certain lexical isoglosses remain suggestive, especially 
those that combine semantic and morphological developments. For an over- 
view of lexical correspondences between Armenian, Greek, and Indo-Iranian, 


4 Phrygian onoman renders highly unlikely the idea that the initial vowel of Laconian 
Evvuaxrparıöng directly reflects PIE *h;nh3-mn and that övoua arose by vowel assimilation 
(cf. Lamberterie 2013: 34 with references). The root of ‘name’ must therefore be PIE *h3neh3-. 

^! Note that the existence of a Phrygian dative in -woı (admitted by Hajnal 2003) is uncertain. 
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see Martirosyan (2013) and Olsen & Thorse (Chapter 12), though part of the 

material consists of shared retentions and independent borrowings. The fol- 

lowing examples are among the strongest: 

* Gr. juap < *ämr ~ Arm. awr ‘day’ < *amór or *amr (cf. Kim 2018: 252), 
a (near-)perfect word-equation: this isogloss of core vocabulary is exclusive 
to Armenian and Greek, but Ved. ahar (gen. ahnas) and Av. aiiara ‘day’ look 
suspiciously similar to each other and to the Graeco-Armenian word. It 
cannot be ruled out that *amy reflects an archaism of PIE (Clackson 1994: 
97; Pinault 2017). 

* The full grade root of ópóc and Arm. erkar ‘long’ < *dudro- is certainly an 
innovation of both branches, whether it is the phonological outcome of 
*duh;-ró- or an analogical reshaping *dueh>-ro- after the adverb *dueh2m 
(cf. Gr. önv, Arm. erkayn < *dudn-io-, Old Hittite tūuaz ‘from afar’). 

* The reduplicated aor. *ar-ar-e/o- (Arm. arari ‘made’, Gr. 7papov 'fixed") 
looks like an innovation: full reduplication with vowel-initial roots was 
productive in Greek, but not in PIE or Armenian; on possible reconstructions 
of the pre-form, see Willi 2018: 80—2, who prefers the scenario that an 
original */;e-h»r-e/o- (> *äre/o-) was restored as */»r-h;r-e/o- before the 
laryngeals were eliminated. 

e Gr. 0epuóc and Arm. jerm ‘warm’ < *g"er-mó-, with e-grade root as opposed 
to the o-grade in most other branches (Lat. formus, Eng. warm). The innov- 
ation seems due to influence of the precursor of 6épouai ‘become hot’ (rather 
than that ofthe nominal form 6épo¢ ‘heat, summer’, as per Lamberterie 2013: 
20), cf. also the noun Alb. zjarm ‘fire’ and perhaps the Phrygian toponym 
Tepun, Germe. 

* *mrto- ‘mortal, man’: this combination of form and meaning occurs only in 
Gr. fijpotóc and Arm. mard (Lamberterie 1997); in Indo-Iranian *mrtá- means 
*dead', as expected. 

* The root *A;b"el- underlying Gr. 09£440 ‘to be useful, cause to grow’, ópeAoc 
‘benefit’ reappears in Arm. y-awelum ‘to add to’, aor. y-aweli, adv. aweli 
‘more’; the homonymous root of ógéAAo ‘sweep’, ópeAua ‘broom’ (both 
only in Hipponax) recurs in Arm. awel ‘broom’. The root is not attested in 
other branches. Clackson (1994: 157) argues that the meaning 'sweep' is 
original; Greek and Armenian would both preserve the derived meaning 
‘increase’, too. 

e Gr. wevödouaı ‘deceive, lie’, weddog ‘lie’ with Arm. sowt, gen. stoy ‘false’: the 
root is not attested elsewhere. 

Whether such examples are sufficient for reconstructing a Graeco-Armenian 

node remains uncertain, as the lack of ascertained common morphological 

innovations Is worrying. The strongest cases by comparison are 

e Arm. lsg. middle -m may match Greek -uaı, but Albanian and Tocharian also 
have an m-ending, so independent innovations cannot be excluded. 
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* The parallels in the formation of nasal present stems in both branches seem 
suggestive, but they are not numerous and are often inexact. Since double 
infix presents of the type Aaußavo are productive in Greek beside thematic 
aorists, they need not be genetically related to Armenian presents in -anem. 
Thus, Arm. /k 'anem ‘leave’ has been compared to Aiuxávo, but the latter is 
not attested in Homer and may be a productive creation based on ¿mov 
(replacing Aeízo), while the idea that Arm. /k 'anem < *lik"-ane/o- arose from 
*link"n- by dissimilation remains conjectural. 

* Gr. où, ovx ‘not’ and Arm. oc have been derived from *(ne) .. . h;oiu k"id by 
Cowgill 1960. However, Clackson (2005: 155—6) argued that oč“ originally 
meant ‘no one’ and goes back to o- (as in ok‘ ‘anyone’ and omn someone") 
plus an older negation *c' (as in čik“ ‘nothing’) that developed from 
*(ne) ... k"id. Since the loss of *ne (e.g. French pas, rien, etc.) and the 
development from indefinite *no one' to *not' (e.g. Eng. not, Germ. nicht « 
*ni wihti *nothing") are both easily paralleled, the value of this isogloss is 
limited. 

Finally, a number of alleged exclusive isoglosses are less strong than they 

seem: 

e Gr. xiov ‘pillar’ matches Arm. siwn ‘id.’ < PIE *kiHuön, but the formation 
may have been present in Indo-Iranian, too (cf. Martirosyan 2013: 119, 
following Lubotsky). 

* Arm. merj ‘near’ and Gr. uéypı ‘as long as, until, etc.’ may reflect the same 
formation *me-g'sr-i ‘at hand’, but the semantic divergence between merj 
and uéypı is considerable (cf. Clackson 1994: 150-1), and *me-g'sr-i would 
have to be an archaism of PIE. 

* Arm. artewan, gen.pl. -ac' ‘eyebrow’ yields an exact correspondence to Gr. 
öperavn ‘sickle’, with a metaphorical meaning of the body part in Armenian. 
However, the fact that öpezavn looks like an instrument noun productively 
derived from ópézo ‘pluck’ casts doubt on its antiquity. Could the word be 
a borrowing from Anatolian Greek into pre-Armenian (cf. Clackson 
1994: 190)? 

* Gr. zpézo ‘be conspicuous’ (Hom.) with Arm. erewim ‘appear’ might be an 
exclusive lexical isogloss if the pre-form is *prep-, though Olr. richt ‘form, 
species’ might derive from *prptó-. Alternatively, if Ved. instr. krpa ‘beauty’ 
is related, the root would be *k"rep-, and the verb a retained archaism. 

* The word for ‘goat’ is Arm. ayc (i-stem) and Gr. aic, aiyóç. Both derive from 
*aig- or *hzeig-; the latter is to be preferred if Av. izaena- ‘of leather’ 
contains an ablauting root variant. A PIE word for ‘goat’ is difficult to 
reconstruct, and probably a borrowing. 

* The meaning ‘laugh’ of the root *gelh>- (Gr. yeAáo ‘laugh’, yéAcc ‘laughter’; 
Arm. calr ‘id.’, gen. calow) is a shared innovation. If the root of Lat. gelidus 
‘cold’, gelu ‘ice’ is related (suggested by Clackson 1994: 131, positing 
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a development ‘shine’ > ‘ice’), the root itself is an archaism. In this case, the 
lexical development to ‘smile, laugh" may have taken place in PIE, with Gr. 
preserving the older root meaning 'resplendent/icy calm’ beside it. 

* The formations of Arm. nor ‘young’ < *neuo-ro- and dalar ‘green’ < 
*d'lH-ro- are not identical with Gr. veapóc ‘juvenile, fresh’ and Oadepoc 
*abundant, fertile’, respectively (note the different meaning of the latter). 
A relatively recent derivation of veapög and 0aAepóc within Greek is more 
likely (van Beek 2021b). 

To conclude, I fully concur with Kim's words (2018: 263): 


[T]he list of linguistic innovations exclusively shared by Greek and Armenian is 
overwhelmingly composed of lexical items. Furthermore, most of these involve general 
root cognations, not full word equations allowing for reconstruction of an intermediate 
preform, which raises the possibility that they are either (partial) independent creations 
or even borrowings from a third language. In this respect, the relationship between 
Greek and Armenian differs greatly from that of Indo-Aryan and Iranian, or Baltic and 
Slavic, where it is possible to reconstruct dozens of distinct lexical preforms for Proto- 
Indo-Iranian and Proto-Balto-Slavic, respectively. 


11.4.4 | Greek and Albanian 


I cannot discuss the evidence for common innovations of Greek and Albanian in 
any detail here; for a list of potential cases, see Chapter 12, where Hyllested and 
Joseph adduce some interesting examples, such as the element *Aid- (contained in 
both Alb. sot ‘today’ and Greek týuepov ‘id.’). However, a number of Greek 
innovations adduced there can or must in my view be dated later than Proto-Greek. 
I am not convinced of a close genetic relation between Greek and Albanian. 


11.5 The Position of Greek 


The further position of Graeco-Phrygian in the family tree is not easy to 
determine. It is customary, and indeed plausible, to include Greek in 
a putative group of “Central” Indo-European languages (including Armenian, 
Indo-Iranian, and probably other satem languages) that remained in the home- 
land after the departure of Anatolian, Tocharian, Italo-Celtic, and perhaps 
Germanic. However, as with Graeco-Armenian (Section 11.4.3), the strongest 
affinities with Indo-Iranian are lexical (Euler 1979). Further qualitative linguis- 
tic evidence for *Graeco-Aryan" is meagre. In the phonological domain there 
are no demonstrable shared innovations (cf. Section 11.2 on the syllabic 
nasals), and those Greek innovations that are difficult to duplicate are without 
parallels in the other branches (e.g. the voiceless aspirate stop series, the double 
outcome of initial yod). In verbal morphology, Greek and Indo-Iranian 
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preserved more archaisms than most branches, partly because of their early 
attestation: these include the distinctions between active and middle voice, 
three different “tense-aspect” stems (present, aorist, and perfect), subjunctive 
and optative, and so on. 

It is often asserted that certain similarities between the verbal systems of 
Greek and Indo-Iranian are common innovations. Thus, the augment, the 
middle perfect, and the pluperfect are ascribed to this late stage of PIE. 
However, the augment may well be an archaic feature. Given that Indo- 
Iranian uses the stative ending *-o in the middle perfect while Greek uses 
middle *-/o, an independent innovation of this formation is possible. This 
leaves us with the creation of primary middle endings in -i, which might be 
shared with Indo-Iranian and Germanic, and the use of the originally contrast- 
ive suffix *-tero- in comparative adjectives (shared only with Indo-Iranian). 

In sum, from a qualitative angle it remains uncertain when exactly Greek 
(Graeco-Phrygian) branched off from Nuclear PIE. There are no indications for 
an early separation (which would require demonstrating a common innovation 
of most other branches that Proto-Greek did not undergo). A relatively late 
departure therefore seems likely, but the evidence for this is mainly lexical. 
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12 Armenian 


Birgit Anette Olsen & Rasmus Thorsø 


12.1 Introduction 


The attestation of the Armenian language begins in the early fifth century 
where, according to tradition, the clergyman Mesrop Mastoc' invented the 
Armenian script for the purpose of translating the Bible. This century marks 
the initial period, the “golden age" (oskedar) of Classical Armenian or grabar 
(written language). Besides the Bible, the earliest texts consist of translations 
from Greek and Syriac, but also a number of original works. These include for 
example Eznik’s “Refutation of the sects”, Koriwn’s “Life of Ma&toc^" and, 
a little later, the historical works by Agat'angelos, P'awstos Bowzand, Lazar 
P'arpec'i and Elise. However, a few graffiti and inscriptions and a papyrus 
containing a sort of Greek phrasebook written in Armenian script are the only 
tangible monuments from the fifth century (see Orengo 2017: 1031—4). The 
literary sources are only transmitted in much later manuscripts, the oldest of 
which go back to the late ninth century, which means that we cannot really be 
certain that they faithfully reflect the actual language spoken at least 400 years 
earlier. 

Besides the classical learned and religious language that was still in use, 
a new written standard, based on western dialects, was created to serve the 
practical purposes of the state of Cilicia during the thirteenth and fourteenth 
century, but after the fall of the Armenian kingdom in 1375, there was no 
administrative system to support a written norm adapted to the spoken lan- 
guage. From the seventeenth century, a lingua franca, vacarakanakan hayeren 
*merchant's Armenian? (Orengo 2017: 1034—5), containing various dialectal 
features, gradually split into the two varieties of modern Eastern and Western 
Armenian, whose standards were fixed by the end ofthe nineteenth century. Of 
these, Eastern Armenian is the official language ofthe Armenian Republic, but 
also spoken in Arc'ax (Nagorno Karabagh) and Iran, while Western Armenian 
as the language of the diaspora following the genocide in 1915 survives in 
bilingual communities in e.g. Lebanon, Syria, Israel, France, Canada and 
the USA. 
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12.2 Evidence for the Armenian Branch 


This section contains a list of phonological and morphological features that 
distinguish Armenian from other branches of the Indo-European family. 


12.2.1 Phonological Innovations 


The most important phonological innovations characterizing the Armenian 
branch are listed below. 


Vowels and Semivowels 

1. Raising of long *e and *o to i and u (written ow) respectively, cf. sirt ‘heart’ 
< *kerd-, towr ‘gift’ < *dohzro-. 

2. Raising of short *e and *o to i and u before nasals, cf. hin ‘old’ < *seno-, 
cown-r ‘knee’ < *gonu-. 

3. Loss of basic length opposition for all vowels: *a, *r and *ii merge with 
their short counterparts, cf. mayr ‘mother’ < *mahzter and acem ‘lead, 
bring’ < */5ag-e-. 

4. Merger of front diphthongs *ei/*oi into & (a mid-high, eventually short 
vowel, distinguished from the more open e), cf. e-dez ‘piled up’ < *(hj)e- 
d'eig^et, meg ‘cloud’ < *hzmoig"o-. While *ou yields oy, cf. boys ‘plant, 
herb’ < *b’ou(h2)ko-, the usually assumed parallel merger of back diph- 
thongs *eu/ou > oy may not be correct. Thus, Lamberterie (1982: 81-82) 
assumes a development *eu > iw, e.g. hiwcanim ‘pine away’ < *seug-/seug- 
(OE seoc, Goth. siuks). See also Olsen 2020. 

5. Loss of tonal accent and fixation of stress, at first on the penultimate 
syllable, eventually leading to syncope of all final syllables. With few 
exceptions, stress is thus synchronically fixed on the final syllable. 

6. At a later stage than (5), weakening of unstressed high vowels and diph- 
thongs, whereby i and u become [o] (usually unwritten), & becomes i, oy 
becomes u, while ea becomes e.” Compare e.g. nom.sg. sirt ‘heart’, gen. srti 
[ser ti]; ser ‘love’, gen. siroy; loys ‘light’, gen. lowsoy; arak 'eal “messenger, 
apostle’, gen. arak 'eloy. 

7. Vocalic resonants *r, */, *m, *n generally yield ar, al, am, an, cf. mard ‘man, 
mortal’ < *mrtó-, Gr. (Aeol.) Ppotos, cf. also Ved. mrta- ‘dead’. 

8. While intervocalic *i is lost, like in e.g. Greek, the reflex in initial position is 
not clear. Options include: 

a. j- as in jowr ‘water’ < *iuHr-o-, Lith. jūra ‘sea’ 


! For various attempts at establishing a relative chronology of the Armenian sound changes, see 
Kortlandt 1980a; Ravnzs 1991; Job 1995. A recent summary of Armenian historical phonology 
is presented by Macak (2017). See also the general surveys by Meillet (1936); Solta (1963); 
Godel (1975); Schmitt (1981); Lamberterie (1989); Olsen (2017b). 

? The diphthong ea results from both *ea and *ia arising after the loss of intervocalic consonants. 
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b. j- as in jow ‘egg’ < *idio- vel sim 

c. zero as in ner ‘daughter-in-law’, Lat. ianitrices.^ Perhaps also ors ‘hunt, 
game’ if < *iorko- (thus Martirosyan 2010: 706). 

An apparent reflex / should probably be explained by other processes. In 

leard ‘liver’ < *iek"rt, contamination with */eip- ‘fat, lard’ is conceivable, 

cf. OHG lebara ‘liver’. Similarly, the word /owc ‘yoke’ could have been 

secondarily affected by the verb /owcanem ‘to loosen, untie’. 

9. Initial *u- yields g-, cf. get ‘river’ < *ued-os-. The internal outcome is more 
complex and alternates between g, w and zero.^ It is possible that these 
reflexes result from a relatively late phonemic split of an intermediary *y”, 
which seems to be indirectly attested in Georgian yvino ‘wine’, if borrowed 
from an earlier form of Arm. gini ‘id.’ < *uoin-io-. Note also Geo. yvia 
‘juniper’, Arm. gi ‘id.’ (HAB 1: 554). 


Laryngeals 

10. Loss of consonantal laryngeals would be consistent with the development in 
the other non-Anatolian languages and thus not a specific Armenian feature. It 
has been claimed that initial *h>- and *h;- are preserved as h- before an 
original e, e.g. haw ‘bird’ < *A;eui-^ There are, however, a number of 
problematic counterexamples, and the hypothesis requires several ad hoc 
reconstructions (Olsen 1999: 766-7; Clackson 2005: 155; Macak 2017: 1059). 

11. Laryngeal vocalization in initial position (“prothetic vowel") before con- 
sonants except *u, cf. ast! ‘star’ < *h»stel for *h>ster. It is debated whether 
Armenian, like Greek, shows a triple representation, but the evidence for 
this claim, most prominently inn ‘nine’ if < *h,neun, is scarce.° Besides, 
triple representation of the prothetic vowels would be at variance with the 
development in other positions. 

12. Vocalization of all laryngeals to a between consonants in initial and final 
syllables, cf. keraw (aor.act.3sg.) ‘ate’ < *e"erh;-to. In internal syllables the 
conditioning of vocalization versus loss is not fully clear (Olsen 1999: 767-8). 

13. Double vocalization of *RHC > aRaC, cf. haraw ‘south’ < *prh3uV-. 

14. Vocalization of at least */ after *i/u in auslaut as in Greek, cf. sterf ‘sterile’ 
< *steria- < *ster-ihz-. It cannot be excluded that this was a morphologically 


The exact reconstruction is difficult, but perhaps *(h)ienhzter > *(h)ienter (deletion of internal 

laryngeal) > *(h)iinér (*-en- > *-in-; *-nt- > -n-) > nir- (*é > -i-; syncope of unaccented *-i-) > 

analogical nom.sg. ner, cf. the pattern ser, siroy ‘love’ (Olsen 1999: 190-1). 

4 For a discussion of the conditioning, see Eichner 1978: 148-9; Olsen 1986; Ravnzs 1991: 72-3; 
Matzinger 1992; Olsen 1999: 787-8. 

5 Thus Austin (1942: 22-3), followed by Winter (1965), Greppin (1973), Kortlandt (1980b), 
Martirosyan (2010: 712—13) and others. 

ê Triple representation is advocated by e.g. Winter (1965), Kortlandt (1987), Beekes (1988, 2003), 

and Martirosyan (2010: 765—6). The opinion that all vocalic laryngeals yield a is defended by 

Klingenschmitt (1970: 80 and 1982: 105), Olsen (1985 and 1999: 262-4), Lindeman (1987: 75— 

83), and others. 
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motivated change, i.e. a levelling in favour of the oblique cases where 
*-ia- < *-jah>-. On the other hand, there is evidence to suggest vocal- 
ization of internal *-ih>/;- and *-uh>/5- > *-ia-/*-ua- as well (cf. Olsen 
1992; 1999: 770-1), similar to the “breaking” in Greek and Tocharian 
(cf. Section 12.4.1), though this is not widely accepted. 


Other Consonants and Clusters 

15. Primary palatalization: the PIE palatals *&, *& and *g" yield s, c and 
j respectively. 

a. At an earlier stage, (labio)velars had become palatals after *u (including 
u-diphthongs), cf. dowstr ‘daughter’ < *d'ugh>tér, loys ‘light’ < *le/ouko-. 

16. Chain shift of the remaining PIE stops: 

a. PIE voiceless stops *t and *k become /' and K' respectively, while 
*p usually becomes h (via *p^ and/or *f), disappearing before o, cf. 
het ‘footstep’ < *pedom vs. otn ‘foot’ < *podm. 

b. PIE voiced stops *5, *d and *g become p, t and k. 

c. PIE voiced aspirated stops *b*, *d^ and *g(?" become b, d and g. 

17. Lenition or loss of particular voiceless and voiced aspirated stops. The cir- 
cumstances are complex, but at least the following developments are fairly 
certain: 

a. intervocalic *p and *b* > w, cf. ew ‘and’ < *h,epi, -(a)wor ‘carrying’ 
< *-bhorah>- 

b. intervocalic *¢ > y before front vowels, cf. hayr ‘father’ < *phzter; 
intervocalic *t > w before back vowels, cf. cnaw (aor.3sg.) ‘was born? 
< *(e-)genh;-to; when not following the stressed syllable, intervocalic 
*t disappears entirely, cf. C'ork' ‘four’ < *k"etores 

c. intervocalic *g"> z, cf. lezow ‘tongue’ < *leig'-uh;- 

d. intervocalic *g"" (> *j) > Z before front vowels, cf. iz ‘snake’ < *h ,eg""- 
i- (apparently no examples of *-g^-) 

e. internal *-pt- > -wt -, cf. ewt'n ‘seven’ < *septm 

f. internal *tR, *kR, *kR > WR, cf. arawr ‘plough’ < *A;arhstro-, mawruk' 
‘beard’ < *(s)mokru- 

g. internal *-pn- > -wn-, cf. k'own ‘sleep’ < *suopno- 

h. initial voiceless stops are lost before resonants, cf. li ‘full’ < *pleh jto- 

i. initial *pt- > t'-, cf. t'er ‘side; leaf? < *pter-.’ 

18. Secondary palatalization of (labio)velars. This development is most clearly 
seen in &'ork' ‘four’ < *k"et(u)ores and jerm ‘warm’ < *g"ermo-.^ This 


7 The seemingly missing lenition of *k and *g(»" (cf. Kortlandt 1980a; Kümmel 2017) and the 
outcome of lenited *q^ (z or r, cf. Jasanoff 1979: 143-4; Martzloff 2016) are subject to debate. 

* There are no examples involving *k, *g* or *g". Considering the evidence at face value thus 
leaves an asymmetrical pattern, which is why it is sometimes assumed that palatalization affected 
all velars (Kortlandt 1975). Numerous exceptions such as keam ‘to live’ < *g"ie/;- would thus 
require analogical explanations which are not always straightforward. 
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feature is perhaps not exclusively Armenian (cf. Section 12.4.3), but 
another uniquely Armenian rule, the *awcanem-rule" (Kim 2018: 258) 
proves the preservation of labiovelars into the immediate prestage of 
Armenian: *VnK"» *VwÉ (cf. 15. a), e.g. *hsng"- > awc(anem) ‘anoint’. 

19. While the general reflex of *s is //O much like Greek, conditioned 
developments are subject to more controversy. 

a. To explain the usual nominal and pronominal ending of the nom.pl. -&', 
it is suggested by e.g. Pedersen (1905: 209-227) and Kortlandt (1984) 
that it is the regular outcome of final *-s. 

b. A ruki-like development of final *-s > -r after i and u (including *e and 
*6 following [1]) may explain intricacies such as singular aorist impera- 
tives like towr ‘give’, which could then reflect the original injunctive 
*doh;-s (cf. Pedersen 1905: 228; Olsen 1989). 

20. Metathesis in clusters of voiced (aspirated) stops and resonants whereby 
e.g. *-dr-, combined with the sound shift (16), yields -rt- with initial vowel 
prothesis, cf. artawsr ‘tear’ < *draku-, merj ‘near’ < *me-&'sr-i. 

21. Epenthesis of *; and *u caused by an *i or *u in the following syllable, cf. ayl 
‘other’ < */aliio-, awl-i ‘strong alcoholic drink? < *hzalu-. While these 
changes are not spontaneous, the conditions are not fully clear. It seems that 
i-epenthesis only took place before resonants and after the vowels a and 
o while u-epenthesis was restricted to a rather different environment, also 
after i (perhaps e) and before stops, cf. giwt ‘discovery’ < *uid-(t)u-. On the 
other hand, it is not found in well-established u-stems such as asr ‘wool’ 
< *p ku- and e.g. Beekes (2003: 205) is sceptical of its existence altogether. 
Perhaps the original place of accent played a role in the development of 
u-epenthesis (see Olsen 1999: 798—801 with references). 

22. Particular developments of various clusters including 
a. *sK, *Ks > c' in most cases, cf. c'elowm ‘split, break’ < *skelH-; vec‘ 

‘six’ « *suueks. Initially, the outcome 3- may sometimes be observed, and 
might be the result of palatalization before front vowels. Alternatively, 
Martirosyan (2010: 516) suggests that š- regularly develops from *sKHV- 
as opposed to *sKV- > c'-. It is debated whether -č'- is the palatalized 
version of *-sK- in internal position or should be derived from *-sKi-. 

b. *d"i>7, cf. mej ‘middle’ < *med"io-. The outcome of *t and *di, either c /c 
or c C, is more controversial (see e.g. Olsen 1993, Kocharov 2019: 30-1). 

c. *Ri > Rj, cf. sterj ‘sterile’ < *sterih;-. 

d. *su, *tu > k', cf. k'oyr ‘sister’ < *suesor. 

e. *du > (V)rk-, cf. erkow ‘two’ < *duó.? 


? Others favour a regular development *du > k, cf. Beekes 2003: 199—200. For a more exhaustive 
overview of developments in clusters, see Godel 1975: 78-9. 
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12.2.2 Morphological Innovations: The Verb 


The Armenian verb has undergone a number of morphological simplifica- 

tions, such as loss of the dual and the distinction between an optative and 

a subjunctive, while the perfect only survives in synchronically opaque 

relics.'° Specific Armenian changes include 

23. Generalization of -e- as thematic vowel with the exception of the subj.1pl. 
-owk' < *-omes and the participle in -own < *-ont-/*-omh ;no-. 

24. Merger of the thematic (or e-stem) endings and the verb ‘to be’ in the 
present active, thus berem ‘I carry’ like em ‘I am’. 

25. Creation of a mediopassive paradigm in -i- from statives in *-e/;-. 

26. Creation of a new imperfect preterite. 

27. Merger of old aorist and imperfective stems for the formation of “root aorists”. 

28. Creation of a “weak” aorist stem in -c -, possibly a remodelling of the old 
s-aorist (cf. Klingenschmitt 1982: 286—7; Olsen 2017b: 443). 

29. Formation of a subjunctive morpheme -ic - of disputed origin. 

30. Formation ofa causative in -owc 'anem, aor. -owc 'i, also of disputed origin. 

31. Formation of a voice-indifferent infinitive in -/ < *-/o-. 

32. Formation ofa past participle in -eal (o-st.), similar to the Slavic /-participle. 


12.2.3 Morphological Innovations: The Noun 


In the noun, the categories of grammatical gender and the dual number are lost, 

while an inventory of seven cases is maintained despite several cases of 

syncretism. The most notable inflectional innovations include 

33. Formation of a gen.dat.abl. plural in -c ', e.g. i-st. srtic' from sirt ‘heart’, 
possibly originally an adjective in *-(i)-sko-. 

34. Introduction of a new abl.sg. ending -e, probably < *-eti. 

35. Introduction of a new loc.sg. ending -i (a-, i- and sometimes o-stems), 
probably < *-h,en. 

36. Merger of old root nouns, heteroclitics and s-stems with other stem classes. 
37. Creation of a heteroclitic u-/n-stem paradigm from original u-stem adjec- 
tives, e.g. barjr ‘high’, gen. barjow, nom.pl. barjownk : Hitt. parku-. 

38. Creation of a marginal /-stem paradigm, apparently extended from the 

paradigm for ‘star’, astt. 
From the field of nominal word formation, the most remarkable innovation 
must be: 
39. The creation of a complex abstract noun suffix -owt'iwn on the basis of 
inherited elements. 


10 For more elaborate treatments of morphological innovations, see Klein 2007; Olsen 2017a; 
2017b; Klingenschmitt 1982 on the verb; Olsen 1999 on the noun; Matzinger 2005a on nominal 
inflection. 
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12.2.4 Morphological Innovations: The Pronoun 


The pronoun is notoriously a word class that is subject to changes and ana- 
logical remodellings, and here Armenian is no exception. However, one feature 
is particularly characteristic: 

40. A systematic distinction between three deictic markers: s for the first 
person, d for the second and n for the third. This system includes the 
postponed articles, -s, -d, -n, the anaphoric pronoun sa, da, na, the demon- 
strative ays, ayd, ayn and various other pronouns, adverbs and 
interjections. 


12.2.5 The Lexicon and Remaining Innovations 


The most remarkable feature of the Armenian lexicon is the scarcity of 
inherited lexemes seen in relation to the abundance of loanwords, mostly 
from Middle Iranian sources, and words of obscure origin. The etymological 
background of around 50 per cent ofthe Armenian vocabulary is unknown, and 
thus an abundance of words that are only attested in this branch help to define 
Armenian as an independent member of the Indo-European family. n 


12.3 The Internal Structure of Armenian 


Armenian is generally considered to be a single-language branch and indeed, 
Classical Armenian appears to be a highly standardized language with very few 
traces of the dialectal diversity that is likely to have existed at the time of the 
composition. According to Meillet (1904), the later dialects all derive from 
a uniform learned xoıvn with very few modifications. As examples of dialectal 
archaisms, Meillet himself (also 1936: 11) mentions the original dialectal form 
lizow ‘tongue’ vs. Classical lezow with umlaut i-u > e-u and the preservation of the 
accusative marker z-, mostly lost in the later language, but preserved in the dialects 
around Lake Van. Within the Classical language itself, we also find doublets such 
as t'arsam/t'aram ‘withered’. Another indication of early dialectal differentiation 
is the word ays, usually ‘evil spirit’, but also attested in the primary meaning 
‘wind’ in Eznik, who explicitly calls it a word of the southerners (Clackson 2005: 
154). The fifty to sixty modern Armenian dialects all fall into one of the two main 
groups, Western and Eastern, with further subgrouping possible. Some important 
criteria for the classification of dialects are the reflection ofthe Classical Armenian 
stops and the formation of the present indicative where both Western and Eastern 
Armenian employ innovative but different formations. '* 


!! See the excellent overview by Clackson 2017. 
On the topic of dialectal subdivision and the question of dialectal diversity in the earliest 
literature, see Adjarian 1909; Martirosyan 2010: 689—704; Martirosyan 2018; Weitenberg 2017. 
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12.4 The Relationship of Armenian to the Other Branches 


In the pre-literary period, there must have been close linguistic contact between 
Armenian and a great number of other known and unknown languages, Indo- 
European — especially shown by the massive layer of Middle Iranian 
loanwords — as well as non-Indo-European, of which the non-Indo-European 
element is responsible for a substantial part of the lexicon, cf. e.g. xnjor ‘apple’ : 
Hurrian hinzuri ‘id.’. While there are relatively few borrowings from Kartvelian 
in the oldest language, the areal influence of the Kartvelian languages may explain 
the dialectal glottalization of old mediae.'^ On the syntactic level, the ergative-like 
construction with participles in -eal where the agent is in the genitive and the 
direct object in the accusative, e.g. nora (gen.) gorceal e z-gorc (acc.) ‘he has done 
the work’, likewise finds parallels in Kartvelian (Stempel 1983: 80-7), but also in 
Iranian, however (Meyer 2017: 109—60). 

Occasionally, it seems justified to attribute lexemes exhibiting irregular 
sound change to an unidentified Indo-European language. Thus bowrgn 
‘tower, pyramid’ and dowrgn ‘potter’s wheel’ have the appearance of deriva- 
tives of *berg^- ‘(be) high’ and *a"erg?- ‘run’ respectively, but in both cases the 
root vocalism and the centum reflex of *-g'- are at variance with established 
Armenian sound laws. 

Otherwise, Armenian shows the strongest similarities to the group of Balkan 
languages, Phrygian, Albanian and in particular Greek (see Figure 12.1). Some 
interesting features of this group are shared with Indo-Iranian (in particular the 
augment and the prohibitive adverb *meh,) and a few with Tocharian. 


12.4.1 Armenian and Greek 


The idea of a particularly close relationship between Armenian and Greek has 
a long history. Thus Pedersen (1905; 1924) mentioned a number of Greek- 
Armenian isoglosses and concluded that no other language was as close to 
Armenian as Greek. Later Bonfante (1937) provided a long list of phonological 


Albanian 
Graeco-Phrygian W Armenian 
Balkanic Armenian P" d E Armenian 


Figure 12.1 The position of Armenian 


13 Adherents of the “Glottalic Theory" interpret this characteristic feature as an archaism (e.g. 
Gamkrelidze 2003 with references). 
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correspondences, most of them not exclusively Graeco-Armenian, Hamp 
(1976) referred to the “growing list of Greek-Armenian isoglosses", conclud- 
ing that the time was “approaching when we should speak of Helleno- 
Armenian", and Lamberterie (1983) considered Armenian to be particularly 
close to Greek. 

The opposite stand was taken by Clackson (1994: 199—200), who ended his 
investigation with the following negative conclusion: “The absence of any 
compelling explanation of a morphological development of either language 
suggests strongly that the languages did not form a sub-group." Even the 
impressive number of lexical correspondences was toned down: allegedly, 
only five word-pairs might reflect a common agreement made jointly by 
Greek and Armenian. 

Most recently, Kim (2018) discarded most of the lexical correspondences 
as "general root cognations, not full word equations" and the notion of 
a Graeco-Armenian unity as an example of the "inertia of established 
scholarly opinion". 

However, while the lexical correspondences are certainly the most promin- 
ent, generally dismissing phonological and especially morphological corres- 
pondences seems unwarranted. In fact, a number of early phonological 
innovations in Armenian appear to be shared with Greek. 

This goes for certain patterns of laryngeal vocalizations, particularly in 
initial position before consonant (11), in connection with the vowels *; and 
*u (14) and of “long resonants”, i.e. *CRHC clusters. As for the initial 
vocalization, Greek clearly shows a triple reflex (e/a/o) of vocalized laryngeals, 
while this outcome is far from assured for Armenian. In fact, one typically finds 
a in place of both *h and *h;, thus ast? ‘star’ = Gr. dornp; aniw ‘wheel’ > Gr. 
óugaAóc ‘navel’. Indisputable examples involving *h, are unfortunately lack- 
ing (see e.g. Clackson 1994: 35).'* At any rate, the tendency for initial laryn- 
geal vocalization is not found anywhere else, apart from Phrygian 
(Section 12.4.2), and it may to some extent be regarded as a shared innovation. 

A closely related change concerns the Greek development of *Cih23C > 
*Cia/oC and *Cuh54C > *Cua/oC, which operated in originally unaccented 
syllables, as observed in e.g. Gr. (wog ‘alive’ < *g"iowó- < *g"ih;-uó-.? In 


14 However, Clackson (1994: 35) considers a single reflex a- most likely on theoretical grounds. 
The final decision depends on the exact analysis of atamn ‘tooth’, traditionally derived from the 
root *h,ed- ‘eat; bite’ (or ‘gnaw’?) and anown ‘name’. 

15 See Francis 1970: 276-7; Normier 1977: 182 n. 26; Rasmussen 1991; Clackson 1994: 41—9; 
Hyllested 2004; Olsen 2009 (for the conditioning); Woodhouse 2015. While this rule, some- 
times referred to as "laryngeal breaking" or “Francis’ Law”, has not met with universal 
acceptance, it remains, in our view, the most economical solution to a number of etymological 
issues. The only serious counterexample, viz. Gr. @dudc ‘spirit’ (cf. Chapter 11), may be 
illusory. As suggested by Kristoffersen (2019), the Greek word, like OHG tuom ‘vapour’ and 
Lat. fumus ‘smoke’ (without Dybo's Shortening! Cf. Section 9.2.3), seems to represent an 
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Armenian, the operation of a similar rule, *-i/2/3- > *-i2- > *-ia-/*-uh»5- > *-u3- > 
*-ud-, is suggested especially by erkar ‘long’, which is identical to Gr. önpos ‘id.’ 
< *duh;-ró-. The value of this example has been questioned due to the possible 
contamination of the adverb *duah2m ‘far’ (Hitt. tuuan ‘to this side’, tiruaz ‘from 
afar’ and Gr. órjv beside the morphologically aberrant Arm. erkayn), but there is in 
fact more Armenian material to suggest that this rule was regular (see Olsen 1992; 
1999: 770-3). Note e.g. keam ‘to live’ < *g"ih3u-, which is traditionally difficult 
to reconstruct (see Martirosyan 2010: 356-7). The development of these 

*CI/UHC sequences may be somehow connected with the rather complex 

and poorly understood development of *CRHC clusters in both Armenian and 

Greek (Woodhouse 2015). However, as laryngeal breaking is a well- 

established feature of Tocharian, it can hardly be considered an exclusive 

Graeco-Armenian isogloss. 

It has been suggested (Olsen 1989) that Greek and Armenian share 
a tendency to voice posttonic *Nt > Nd, though the contexts are not identical 
as the development in Greek is restricted to *Nr, e.g. déxa, dékatoc ‘ten’ vs. 
ó&kác, dekadog ‘a decade’, but *h,enterah>- ‘entrails’ > Arm. anderk' vs. Gr. 
évtepa. Rather than an actual shared innovation, we may be dealing with an 
areal feature. 

In general, the most significant argument in favour of a common intermedi- 
ate proto-language is the existence of shared morphological innovations. For 
Greek and Armenian, at least a handful of cases of this kind may be adduced: 
e formation ofa nu-present *ues-nu- from the root *ues- ‘dress’: Arm. z-genowm, 

Gr. Évvoui as a common substitution for the causative *uos-eie- (Klingenschmitt 

1982: 248) 

* formation of a reduplicated aorist *ar-ar-e/o-: Arm. arari ‘I made’, Gr. 
npapov ‘I fixed’ (Chapter 11) 

e formation of a (reduplicated?) present stem *(si)-sth >-ske-: Arm. alac'em 
‘ask, request’, Gr. iAookouaı ‘appease’ (Klingenschmitt 1970). The develop- 
ment *-sk- > -c'- seems to be regular before front vowels, and the reduplica- 
tive syllable would be lost due to syncope in Armenian. While the root is not 
exclusively Graeco-Armenian (cf. e.g. Lat. solor ‘console’), the stem forma- 
tion, perhaps patterned on *gi-gnh 3-ske- (Arm. canac 'em, Gr. yryvóako), is 
unique for the two branches 

* inflection ofthe *-men(t)-stems: Arm. sermn, gen. serman, Gr. on&pua, -UOTOÇ 
‘seed’, Arm. jermn, gen. jerman ‘heat, fever’. Greek and Armenian seem to 
have shared the generalization ofthe suffix variant *-mnt- in this type, which is 
thus a likely candidate for a common innovation 


o-grade, *d"ou(h2)mo- (Gr. *-Vu- > -ü- before labials) as opposed to the zero grade of Ved. 
dhümá-, Lith. dümai. 

Unstressed *-mnt- > -man-. However, an analogical explanation of the Armenian paradigm 
cannot be definitely excluded. 
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* creation of the grammaticalized adjectival suffix conglomerate *-odes 
< *-0-h30d-és, lit. ‘smelling’, e.g. Arm. awazowt : Gr. duadoöng ‘sandy’ 

e formation of the suffix conglomerate *-e(h;)u- + -to/ah;- or -ti- in Arm. 
-oyt' € *-e(h;)u-ti-, e.g. erewoyt ' ‘appearance’, Gr. tedevty ‘end’ <*-e(h )u-tah>-. 
The Greek type in -evarc is late, but a common prestage is most likely a shared 
innovation. 

The most spectacular evidence for a Graeco-Armenian subgroup remains a set 

of lexical isoglosses which vary in nature. Some are simple exclusive root 

correspondences, but the following etyma are among the strongest examples 
showing common morphological and/or semantic innovations based on 
inherited roots. For a comprehensive collection of material, see e.g. Solta 

1960, Clackson 1994 and Martirosyan 2013. 

* *medesa- ‘mind’: Arm. mit, usually pl. mit-k' (gen.-dat.-abl.pl. mt-ac ‘); Gr. 

unöea ‘counsels, plans, arts’, cf. umdouaı “to contrive, plan’. At least the long 

root vowel, whatever its explanation, seems to be an innovation. 17 Note also 
the similar semantics as opposed to Umb. meis ‘law’. The long root vowel 

cannot be the reflection of an original Narten-ablaut (pace Clackson 1994: 

148) since Gr. unmdouaı only has middle forms. Also, the long vowel forms 

found in Germanic and Old Irish are most likely secondary (Meissner 2006: 

80-1). 

*d'ehs- ‘god’: Gr. cóc ‘god’ (< *d"h,s-o-) agrees semantically with Arm. 

di-k' “(heathen) gods’ (< *d'eh;s-es) as opposed to Lat. feriae ‘holidays’, 

fanum ‘temple’ which, together with potential Anatolian cognates, viz. 

HLuw. tasan(-za) ‘votive stele’, Lyc. 99én- ‘altar’, suggest an original 

meaning ‘votive, sacred (thing)’. This would make the semantic change to 

‘god’ a shared innovation (Lamberterie 2013: 35-6) in which Phrygian also 

takes part, cf. Phryg. (dat.pl.) óecc ‘god’ (Section 12.4.2). 

e *mrtö- ‘mortal’: Arm. mard ‘(mortal) man, person’, Gr. (Aeol.) fporóc 

‘mortal’. Formally, this is obviously the past participle of PIE *mer- ‘to 

disappear, to die’. The semantic shift from ‘dead’ (Skt. mrtd-) to ‘mortal’, 

presumably a contrast formation to the privative *n-mrto- ‘immortal’, is not 

a very trivial innovation and has a low chance of reflecting parallel develop- 

ments. It is also remarkable that the contrast human : god is expressed by the 

same word pair, Arm. mard : dik", Gr. Bpotdc : Qeóç. 

*suekura- ‘mother-in-law’: Arm. skesowr, Gr. &xvpd. Presumably this exclusive 

Armenian-Greek form replaced the more archaic feminine *suekruh>- (cf. Skt. 

Svasrii-, Lat. socrus, OCS svekry) by analogy with *suekuro- ‘father-in-law’ 

(itself probably a secondary derivative of PIE age, see Olsen 2019: 153). 

Although this innovation may be said to be trivial, it is not found elsewhere, 

where the original uhz-stem is generally well preserved. 


17 [t may result from contamination with *meh;- ‘measure’ (GEW 2: 223). 
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e *mätru(u)iah>- ‘stepmother’: Arm. mawrow, Gr. untpvı&. Armenian and 
Greek agree in derivation and meaning as opposed to OE modrige *mother's 
sister'. It is uncertain whether the Germanic forms reflect the same deriv- 
ation. Clackson (1994: 145—7) considers this isogloss insignificant since both 
the form and meaning might be archaic (see also Olsen 2019: 156—7). On the 
other hand, the agreement of an exclusive form and meaning ‘stepmother’ as 
opposed to the expected *mother's sister' in Germanic is striking enough to 
suggest a joint innovation. 

e *preis-g"h;-u- ‘one who goes in advance, elder’: Arm. erec', gen.sg. eri- 
c 'ow; Gr. zpéafvc, Cretan zpeioyoc (Lamberterie 1990: 909-11, Clackson 
1994: 165; on the phonology, see Olsen 1988). Lat. priscus ‘ancient’, an 
o-stem, is unlikely to continue an older u-stem and rather reflects the suffix 
*_ko-, cf. Weiss 2020: 315. 

* *osara- ‘harvest’: Arm. (amis) ara-c' ‘the sixth month of the ancient 
Armenian calendar (month of harvest)’ and Gr. 6z-@pa ‘part of the year 
between the rising of Sirius and of Arcturus, between summer and autumn'. 
The shared preform *osara- (or *ohara- if *s > h was a shared development) 
seems to be a thematization of the PIE strong stem *h,os-r-, cf. Ru. ósen' 
‘autumn’, Goth. asans ‘harvest’ (Martirosyan 2013: 110). 

* *g"Ih»(a)no- ‘acorn’: Arm. katin, Gr. BóAavoc (Clackson 1994: 135). Greek 
and Armenian are the only branches to agree on the suffix, cf. Lat. glans 
(< *g"Ih;-nd"-), RuCS Zeludo (< *g"elh;-ond'-), Lith. gilé (< *g"Ih;-iah;-). 

e *perHi-men- ‘piercing object’: Arm. heriwn ‘awl’ < *perHimon, Gr. nepovn 
“pin, buckle, brooch’ < *perHimneh;, cf. axovn ‘whetstone’: dav ‘anvil’. 
It may be assumed that the root is *perHi-, which would explain Gr. zeipo, 
OCS na-pero ‘pierce’ as simple thematic presents (Olsen 1999: 492). Of 
course, it cannot be excluded that this isogloss is a shared archaism. 

e *pseud- ‘lie’: Arm. sowt ‘false’, stem ‘lie’, Gr. wevdouai “deceive, lie’, 
wedöog ‘lie’ (Clackson 1994: 168-9). If the basic root is *pseu- ‘blow’, as 
suspected by Taillardat (1977: 352—3; cf. Fr. vendre du vent, Eng. windy, hot 
air) only Armenian and Greek agree on the root-extension -d- and the 
semantic specialization. Moreover, Arm. sowt « *psudo- has the appearance 
of a contamination of a ro-adjective, like Gr. vvópóc, and a full-grade s-stem, 
like Gr. weddoc, meaning that traces of the Caland system would have 
survived into a common prestage. This favours a common Graeco- 
Armenian innovation. 

e *megh»;r- ‘make great: Arm. mecarem ‘honour’, Gr. ueyaípo ‘grudge, 
envy'. The denominative verb based on the r-stem variant of the heteroclitic 
corresponding to Ir. *mazar-/mazan- or *masar-/masan- (Kümmel 2012) is 
almost certainly a common innovation. 

* *drep-nnah>- ‘sickle’ in Gr. öpezavn ‘sickle’, Arm. artewan (-ownk', -anc '/ 
-ac^) ‘eyelid; brow’ (Lamberterie 1983: 21-2). The root *drep- is not 
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exclusively Graeco-Armenian, thus Ru. drdpat’ ‘scratch, tear’ beside Gr. 
öper® “pluck, cut off’, but the striking correspondence consists in the 
derivational chain *drep-mn (Gr. (Hsch.) ópéuua: xAguua (about stealing 
fruit); Beekes 2010: 353) = *drep-nnah;- > artewan-/ópenávg, very much 
in accordance with inherited principles. Clackson's tentative suggestion 
(1994: 112) of a very early loan from Greek is extremely unlikely, as we 
have no examples of Greek loanwords borrowed before the soundshift 
(*d 7 t). 

e *Ipalh;-trih;- or *hjlh;-trih;- ‘female miller’: Arm. alawri ‘female who 
grinds corn’, Gr. ddetpic ‘female slave who grinds corn’. Apparently a vrkih- 
type derivative of an agent noun in *-fer/tor-, an otherwise extinct deriv- 
ational type in Armenian. Clackson’s suggestion (1994: 92) of “a secondary 
derivative of an unattested instrument noun *alawr ‘mill’” is less econom- 
ical. Again, a common innovation is the simple solution. 

* *d'al-ro- or *d^"Hl-ro-: Arm. dalar ‘green, fresh’, Gr. Oasepdc ‘blooming, 
fresh, abundant’. As Gr. -4p- is phonotactically impossible, and Arm. -/r- 
never represents an old consonant cluster, Gr. -epo-, Arm. -ar- do not 
necessarily continue a sequence *-Vro-; more likely, we are dealing with an 
old *-ro-stem, only attested in Armenian and Greek. The root, however, is 
also found in Alb. dal ‘sprout, enter, come’. 

Some isolated roots might be retentions from PIE but are still worth taking into 

account. 

. *ken(-eu)-o- ‘empty’: Arm. sin, Gr. kevos, Ion. xeivóc, Hom. xeveóogc (cf. 
Clackson 1994: 138). 

e *mosg'- ‘young bovine’: Arm. moz-i, Gr. uóoyoc. Clackson's (1994: 154) 
suggestion of a borrowing from Greek to Armenian seems phonetically 
impossible and the relatively late (eleventh century) attestation of the 
Armenian word is not a serious problem in itself. Most likely, it is a shared 
borrowing, but IE origin cannot be excluded. 

e *kiuöN ‘pillar’: Arm. siwn, Gr. xíov. The appurtenance of other cognates (cf. 
Lubotsky 2002; Chapter 11) is uncertain, but cannot be excluded. Clackson 
(1994: 140—1) considers this word a shared borrowing, which would make it 
an important isogloss as the forms are identical. 

e The root *A;b'el-, exclusively attested in Greek and Armenian, has the 
double meaning ‘increase’ and ‘sweep’ in both languages: Arm. awel 
‘broom’, awelowm ‘increase’ : Gr. dgedtpov ‘broom’, ógéAAo ‘sweep’ 
(Hipponax) and ‘increase’; the verb also forms a thematic aorist in both 
languages: Arm. y-awel, Gr. dgede (Clackson 1994: 156-8). 

* Arm. awr ‘day’ ~ Gr. jap (cf. Chapter 11). 

Finally, a number of words seem to have been borrowed at a common prestage 

of Armenian and Greek as the attested forms allow for reconstructions of proto- 

forms which, for different reasons, are unlikely to be inherited from PIE. The 
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shared substrate interface seems to contain several chronological layers, some 

presumably formed after particular Armenian or Greek sound changes.'* The 

following examples, where all sound changes are observed, can be considered 
part of the earliest layer which may have been contemporaneous with a shared 

Graeco-Armenian language stage. 

* *aig- ‘goat’: Arm. ayc ‘(she-)goat’, Gr. aid, aiydc. Note the Arm. plural form 
ayci-k' (beside ayc-k') and derivatives ayceay ‘made of goatskin’, ayceamn 
‘roebuck’ which can reflect the same *ih>-collective as Gr. aiyis ‘goatskin’. 
The etymon is probably non-IE (Solta 1960: 405; Kortlandt 1986: 38-9; and 
especially Kroonen 2012: 245-6). Lith. ozys, Skt. ajá- reflect *ag- without 
the semivowel and although the forms are unlikely to be separated com- 
pletely, the variation cannot really be explained in a PIE framework.'” In 
light of this, the Armenian-Greek agreement in both root structure and 
derivation should be considered highly significant. Another possible match 
is found in Alb. edh ‘kid’, dhi ‘she-goat’ < *aig-iiah> (Demiraj 1997: 160). 

e *ant'-r- ‘coal, ember (?)’: Arm. ant '-el ‘hot coal, ember’, ant -ayr ‘spark’ 
(< *ant'ari-), dial. ant'roc' ‘poker’; Gr. &vOpaé ‘charcoal’ (Jahowkyan 1987: 
592, Martirosyan 2010: 85; 2013: 113). A substratum origin is supported by Geo. 
ant-eba ‘to burn’ and the fact that the shared root seems to contain voiceless */^ 
while there is no external support for a reconstruction */»antH- vel sim. 

e *sep's- ‘to boil, cook’: Arm. ep 'em ‘to cook’, Gr. éya@ ‘to boil, seethe’. It is 
unlikely that Arm. p' continues intervocalic *-ps-, cf. eres ‘face’ < 
*kvrepsah» (Olsen 1999: 64; alternatively Witczak 1991). Again, there are 
few other options than to reconstruct a voiceless aspirate, perhaps from 
a non-IE source. 

* *tūp'- ‘plant, bush (?)’: Arm. t'owp ' (gen. t'p 'oy) ‘bush, bramble’, Gr. zupn 
‘reed mace, Typha angustata’. Although the semantic details are not fully 
clear, and Armenian has an o-stem as opposed to the Greek feminine, the 
roots are identical. The root structure points to a substratum origin. Lat. rüber 
‘swelling’, ON pufa ‘knoll’ may be separate borrowings from the same 
source or entirely unrelated. 

* *tarp- ‘basket’: Arm. t'arp' ‘fishing basket, creel’, also t'arb as a literary 
form meaning *wooden framework’ (HAB 2: 162; Martirosyan 2010: 281— 
2 with references); Gr. tépay ‘large wicker basket’. There are no convin- 
cing IE etymologies (Chantraine 1999: 1095; Clackson 1994: 183; 
Martirosyan 2010: 281-2). This etymon may represent a very early bor- 
rowing, with the regular Armenian outcome of */arp- being represented in 
the form t'arb. 


Cf. e.g. Arm. sex ‘melon’ ~ Gr. oxóa *bottle-gourd' with no change of *s > A in either language. 
See also Martirosyan 2013: 122-3. 

For this reason, the connection with Av. izaena ‘leathern’ from a putative zero grade *h,ig-, 
mentioned e.g. by Martirosyan (2010: 58), is less likely. 
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Summing up, the relations between Armenian and Greek seem to be signifi- 
cant enough to justify a common node. They do not only consist of shallow 
lexical correspondences. The common morphological innovations are far from 
negligible, and in numerous cases, a given lexical item shows a striking 
similarity with respect to word formation and semantics. Exclusive loanword 
isoglosses further confirm this standpoint. 


12.4.2 Armenian and Phrygian 


The idea of a special relationship between Armenian and Phrygian goes back to 
Herodotus (7.73), who claimed that the “Armenians” (Apuevıor) were descend- 
ants of the Phrygians, and a quotation from Eudoxos by Stephanos of 
Byzantium, according to whom the Armenians come from Phrygia. He claims 
that their language is also very similar to that of the Phrygians. However, the 
closest known relative of Phrygian is undoubtedly Greek (Chapter 11), and 
while both Armenian and Phrygian may be attributed to the Balkan group of 
Indo-European of which Greek seems to be the central member, there are no 
exclusive isoglosses between the two.^? 


12.4.3 Armenian and Albanian 


Like Greek, Armenian and Phrygian, Albanian appears to belong to the 
Balkanic languages in the narrower sense, but apart from the palatalization of 
labiovelars as opposed to plain velars, perhaps a parallel development of the 
cluster *su- and a few lexical correspondences (Kortlandt 1986), there are 
hardly any conspicuous exclusive isoglosses between Armenian and Albanian 
(see further Chapter 13).?! 


12.5 The Position of Armenian 


In Matzinger's treatments of the question (2005b: 382; 2012), Greek has the 
central position within the Balkanic group with direct relations to Phrygian, 
Armenian, Albanian and perhaps — surprisingly — Tocharian.” Evidence for the 
inclusion of Tocharian is extremely weak, however, and it is generally con- 
sidered an entirely separate branch of Indo-European (see Chapter 6). Evidence 
for the Balkanic group is found at all levels, phonology, morphology and 
lexicon, and can be summarized as follows: 

* “laryngeal breaking" (14): Greek, Armenian and Tocharian 


20 See Matzinger 2005b and 2012 for details. 
?! Details on the connection between Armenian and Albanian are presented by Kortlandt (1986). 
?2 See e.g. also Klingenschmitt 1994 and the somewhat idiosyncratic overview by Holst 2009. 
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e development of at least *-ih > *-io; (14): Greek, Armenian and Albanian 
(Klingenschmitt 1994: 244—5) 

* prothetic vowels (11): Greek, Phrygian and Armenian; Greek and Phrygian 
agree on "triple representation" 

* traces of labiovelars in satem languages. In Armenian and Albanian, old 
voiceless and voiced aspirated labiovelars seem to palatalize (Pisani 1978), 
anda similar tendency may be observed in the centum language Greek, where 
labiovelar mediae typically avoid palatalization, cf. e.g. Arm. keam ‘live’ : 
Gr. Beouaı, Piorog. Here we seem to be dealing with an areal feature 

* loc.pl. ending *-si for *-su: Greek, Albanian; the origin of Arm. -s is 
unknown 

e mid.1sg. primary ending *-mai for original *-h>ai: Greek (-4a1), Armenian 
(-m), Albanian (-m) 

e formation of s-aorists in *-ah>-s- from denominative verbs in *-ah>-ie/o-: 
Greek, Armenian and Albanian (see Soborg 2020: 78-80, 103, elaborating 
on Klingenschmitt and Matzinger); this connection presupposes that 
Armenian aorist marker -c ‘- derives from the s-aorist 

e aorist *e-k"le-to ‘became’: Greek, Armenian, Albanian (Gr. mieto, Arm. 
elew, OAlb. cleh, see LIV? 386-7) 

e negation *(ne) h2oiu k"id: Gr. ovxi, Arm. oc' and Alb. as but cf. also, as 
demonstrated by Fellner (2022), the closely related emphatic negation Toch. 
A mà ok, B mawk/ma,k 

* *aig- ‘goat’: Greek, Armenian and Albanian 

e *d'eh,s- ‘god’: Gr. Oeóç ‘god’ (< *d'h;s-o-), Arm. di-k' ‘(heathen) god’, 
Phryg. dews 

e additional -ai(k)- in the inflection of the word for ‘woman’: Gr. yovaux-, 
Phryg. acc. xvouxav, Alb. gra (Matzinger 2000); synchronically, Arm. 
kanayk' is simply the nom.pl. of a stem kanay-, but it cannot be excluded 
that the ending -&' is due to a reinterpretation of a suffixal -k- 

e *g""ermo- ‘warm’: a full-grade mo-adjective common to Gr. 0epuóc, Arm. 
jerm and Alb. zjarm 

A discussion of the relationship between the Balkan group and Indo-Iranian, 

including such features as the augment, which may theoretically represent an 

archaism, is beyond the scope of this chapter. 
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13 Albanian 


Adam Hyllested & Brian D. Joseph 


13.1 Introduction 


Albanian is sometimes considered the stepchild of Indo-European linguistics, for 
various reasons. For one, it is the latest attested IE branch; its first documentation 
is a 1462 one-line baptismal formula, and the first substantial text the 1555 
Missal of Gjon Buzuku. Due to this late attestation, many details of its historical 
development are shrouded in mystery, and its present form does not always 
appear obviously Indo-European. Consider, for example, the numerals gjashte 
‘6’ and tetë ‘8’, which despite looking strikingly different from, say, Latin sex 
and octö, in fact reflect the expected outcomes of PIE *séks-tV- and *oktö-tV-. 

Moreover, the complicating factor of heavy external influence can make it 
difficult to determine what is inherited from PIE. Not only are there Albanian 
borrowings from Ancient Greek, Latin (sensu lato), Slavic, Turkish, and Italian, as 
well as from neighbouring Balkan languages, but there is also structural conver- 
gence with other Balkan languages, especially Modern Greek, Macedonian, 
Aromanian, and Romani, but also Turkish, and, by extension, Bulgarian, 
Meglenoromanian, and Romanian. This convergence covers phonology, e.g. voi- 
cing of nasal + stop clusters, as in këndoj ‘sing’ (borrowed from Latin canto), 
matching a development in Greek and Aromanian; morphology, e.g. the merger of 
genitive and dative cases, matching a development in Greek, Aromanian, 
Romanian, Macedonian, and Bulgarian; syntax, e.g. doubling of direct or indirect 
objects by weak pronouns, matching a development in Greek, Aromanian, 
Romanian, Macedonian, Bulgarian, and to some extent, Romani; and semantics, 
e.g. creation of admirative mood forms to mark non-confirmativity, matching 
a development in Macedonian, Bulgarian, and Turkish. 


13.2 Evidence for the Albanian Branch 


These difficulties notwithstanding, several innovations define Albanian and set 

it apart from all other branches of IE, including 

e *s> [J] (in IPA, spelled (gj) in standard Albanian orthography) in initial position 
before a stressed vowel, cf. gjashtë ‘6° < *séks-tV- vs. shtatë ‘7’ <*septm-tV-. (gj) 


223 
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represents a voiced dorsopalatal stop, though with varied secondary outcomes 
dialectally. This change is unparalleled within IE. 

+ *k> [0] (spelled (th)), a change found only also in Old Persian among other 
IE branches; e.g. athét ‘harsh, sour’ < *ak- ‘sharp’ (cf. Ved. as-man- 'stone") 

e *$(^) > [ð] (spelled (dh)), also unparalleled within IE,’ e.g. udhë ‘way’ 
< *ug"-o- (the root of Lat. veh-d ‘convey’) 

* loss of word-internal voiced stops under certain conditions, e.g. ujé *water? 
« PAlb. *ud-r-jà 

e *5 7 e, as in tetë ‘8’ < *oktö-tV- 

e *e7 o, as in mos ‘not; don’t!; lest? < *meh;-k"id (cf. Gr. tuj) 

e -ni as 2pl. non-past verbal ending, e.g. present indicative ke-ni ‘you all have’, 
imperative ki-ni ‘you all have", from a reanalysed and repurposed adverbial 
*nu ‘now’ (Rasmussen 1985) 

* a postposed definite article, as in det-i ‘the sea’ (literally ‘sea-the’).” 

These characteristics give ample cause for treating Albanian as a separate 

branch within IE, even with various complications in analysing forms. 


13.3 The Internal Structure of Albanian 


Despite constituting its own branch within IE, Albanian is hardly a linguistic 
monolith. In fact, there are major dialect divisions within the branch, the oldest 
and most important being a north-south one: the Geg dialect group occurs 
north of the Shkumbin river (roughly in the middle of present-day Albania), 
thus covering northern Albanian and the Albanian of the nation-states of North 
Macedonia, Kosova, and Montenegro, while the Tosk group occurs south of the 
river, and includes the Arbéresh diaspora communities of southern Italy and the 
Arvanitika diaspora communities scattered around Greece. 

Dialect differences separating Geg and Tosk involve all levels of linguistic 
structure. In phonology, Geg has nasalized vowels whereas Tosk has lost nasal- 
ization (e.g. asht ‘is’ vs. Tosk është < *ensti < PIE *A;en-h;esti), maintains 
intervocalic -n- whereas Tosk denasalizes it to -r- (e.g. vené ‘wine’ vs. Tosk 
verë) and has reduced nasal-plus-stop clusters to nasals whereas Tosk maintains 
the clusters (e.g. nimoj ‘I-help’ vs. Tosk ndihmoj). In morphology, Geg has 
participials in -m- (among other endings) whereas Tosk mostly uses -uar (e.g. 
harrum ‘forgotten’ vs. Tosk harruar), and Geg forms its future tense with an 


! The notation &(") indicates that the PIE voiced aspirated and voiced plain stops generally merged in 
Albanian; while this development is characteristic of Albanian, it is not particularly striking within 
IE, occurring, presumably independently, in Anatolian, Balto-Slavic, Celtic, Iranian, and Tocharian. 

? This feature is found also in neighbouring languages, especially Aromanian, Macedonian, and 
Romanian, suggesting causality through contact rather than internal innovation within Albanian. 
However, Hamp 1982 argues that the ancient toponym Drobeta (in present-day Romania) 
reflects a Roman misinterpretation of *druwa-tà ‘the wooded (place)’, with a postposed definite 
article, suggesting it reflects an old Albanian syntagm. 
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inflected form of ‘have’ plus an infinitive (consisting of me with a participial) 
whereas Tosk uses an invariant (3sg.) form of ‘want’ with an inflected subjunct- 
ive with the modal marker të (e.g. ke me shkue ‘you will go’ (literally *you-have 
to gone") vs. Tosk do té shkosh (“it-wants that you-go")). In syntax, Geg uses its 
(uninflected) infinitive with me in complement structures where Tosk uses the 
(inflected) subjunctive with re, e.g. filloj me shkue ‘I begin to go’ (literally 
“T-begin to gone") vs. Tosk filloj té shkoj (literally “I-begin that I-go”). Finally, 
there are lexical differences, e.g. Geg tamel ‘milk’ vs. Tosk qumësht. 

Within the Geg and the Tosk dialect complexes, there is much regional 
variation, the details of which are beyond the scope of this chapter. It can be 
noted, though, that diaspora varieties of Tosk show the effects of differential 
contact situations: Arbéresh in Italy not only has many Italian loans not found 
in Balkan Tosk, e.g. kamineta ‘chimney’ (cf. Italian camineta “fireplace’) but 
also lacks Turkish loanwords (cf. Balkan Tosk oxhak ‘chimney, fireplace’, from 
Turkish ocak), reflecting its absence from the Balkans after approximately the 
fifteenth century. Similarly, Arvanitika in Greece shows various Greek features 
not generally found in Tosk; for instance, according to Sandfeld (1930: 104), in 
Arvanitika, mnj (Sandfeld's notation) occurs for mj elsewhere in Balkan Tosk, 
e.g. mnjekré ‘chin; beard’ (vs. general Tosk mjekér), a shift he states is “comme 
en grec" (cf. Thumb 1912: 830, who reports colloquial Greek uvıa ‘one.FEM’ 
(presumably [mpja] or [mpa]) versus earlier, and still occurring, wá ([mjá])). 


13.4 The Relationship of Albanian to the Other Branches 


Albanian shows mixed dialectal affinities, sharing key features with different 
sets of languages within IE. This situation makes for a complicated determin- 
ation of how to subgroup Albanian with other branches. Ultimately, although 
no consensus prevails as to the exact classification of Albanian, we argue here 
that lexical and morphological isoglosses point to a Greek-Albanian subgroup, 
a grouping suggested by computational phylogenetic methodology in Chang 
et al. 2015 (see Section 13.5.2; note also Holm 2011). 

We base our discussion largely on significant, non-trivial innovations 
Albanian shares with other branches. However, what counts as a shared innov- 
ation as opposed to a shared retention of course depends on decisions made 
about the nature of the proto-language in question. Thus, assessments about 
subgrouping can become complicated and involved. 

For instance,’ Cowgill (1960) proposed that Greek ovfki) ‘not’ could be 
connected with Armenian oc‘ ‘not’, with both deriving from a phrase *ne ... 


? Other cases like this of what we consider retentions, but which some scholars might see as 
innovations, are the use in prohibitions of *meh, (Alb. mos, Gr. un; see also Section 13.4.7 
Inflection and Morphosyntax) and the use of the augment in marking past tense forms. Space 
limitations preclude discussion here; see Joseph 2013. 
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hiu k"id, composed of the negative marker *ne, the noun *h dju ‘life-force’, 
and the indefinite pronoun *K"id, thus originally “not on (your) life; not at all”, 
as an emphatic negator. He conjectured, following Pedersen 1900, that the 
Albanian negative as ‘nor, and not’ might belong here too but was reluctant to 
pursue the connection. Joseph (2005; 2022) has followed up on the Albanian 
angle, arguing that the negative prefix as- ‘not’, as in as-gjé ‘nothing’ (cf. gje 
‘thing’), is what matches oö(ki) and oc '.* Ostensibly, this *ne ... h>öiu kid 
phrasal negation could be a shared innovation linking Albanian, Armenian, and 
Greek (Section 13.4.8), if restricted to those branches. However, Garnier 2014 
and Fellner 2022 have argued that Latin haud ‘not’ and Toch.A ma ok, Toch.B 
mawk, ma,k, respectively, also reflect *(ne) ... h;óiu k"id, so this negator is 
shared by languages that do not otherwise show evidence for being subgrouped 
together. Thus *ne .. . h;óiu k"id must be of PIE age, so its occurrence in these 
languages 1s a shared retention inherited in each and therefore irrelevant to 
subgrouping. Any potential shared innovation in principle must be examined 
carefully to determine its status vis-à-vis innovation versus retention. 

As noted above, there are numerous, often contradictory, indications of close 
connections between Albanian and other branches of IE, and though we 
ultimately favour the connection with Greek, we review here the evidence 
that aligns Albanian with one or another branch of IE. 


13.4.1 Albanian and Balto-Slavic 


Various features connect Albanian with Balto-Slavic. We mention a few here, 
and point interested readers to Porzig 1954: 174-7, Jokl 1963, Cabej 1975, 
Huld 1984: 166, Orel 1994; 2000: 254-6 for further details and assessment. 


13.4.1.1 -teen Numerals Albanian forms the teen numerals eleven to nine- 
teen using a pattern of DIGIT-on-TEN, e.g. njémbédhjeté ‘eleven’ (cf. njé ‘one’, 
mbi ‘on’, dhjetë ‘ten’), that seems to parallel Slavic (e.g. Ru. odinnadcat’ 
‘eleven’ (cf. odin ‘one’, na ‘on’, désjat' ‘ten’)) and part of Baltic, specifically 
Latvian (e.g. vienpadsmit ‘eleven’; Lithuanian aligns with Germanic here, 
using a formative based on */eik"- ‘leave’, not a form of ‘ten’). However, 
there is one key difference between the Albanian and the Slavic/Latvian 
patterns. Albanian, along with Romanian, has a feminine form of ‘ten’, 
shown by the use of the feminine tri ‘three’ with dhjeté ten’ in the formation 
of ‘thirty’, tridhjeté, whereas Slavic has a masculine form, as in the Russian use 


^ The relationship between the free word as and the prefix as- is disputed; Joseph sees them as 
having different origins, while others connect them. That issue is irrelevant here, as the fact of 
there being some Albanian cognate to the Greek and Armenian forms is all that matters in this 
case. See also Hackstein 2020 on sources of negation markers in Albanian, including *ne ... 
hzóiu k"id. 
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of masculine dva ‘two’ in the formation of ‘twenty’, dvddcat’ (literally “two 
tens"); Romanian for ‘twenty’ is douăzeci ‘twenty’ (literally “two tens"), with 
feminine doud, thus with feminine ‘ten’. 

Following Hamp (1992), these facts can be interpreted for the Balkans as 
follows. The variety of IE destined to become Albanian (Hamp's “Albanoid”’) 
was a Northern IE language, grouped with or in contact with Germanic and 
Balto-Slavic. Within Baltic, Lithuanian absorbed the teen-numeral pattern of 
Germanic, whereas Latvian interacted with Slavic and Albanoid, an inner- 
Baltic difference that makes sense geographically. Albanoid, along with 
Latvian and Proto-Slavic, developed the DIGIT-on-TEN pattern, presumably 
an innovation in one language that spread by contact into the others, but its 
speakers changed this pattern as they moved south into the Balkans and came 
into contact with the variety of Latin that some of its speakers shifted to, 
yielding Romanian. This scenario accounts for both the similarities between 
Albanian and Slavic (and Latvian) and the differences within Baltic, while still 
allowing for the specific Albanian-Romanian parallel to emerge. 


13.4.1.2 Winter s Law Winter (1978) posited for Baltic and Slavic the length- 
ening of vowels before PIE voiced plain stops (mediae, e.g. *d), a prime 
example being Balto-Slavic *séd- ‘sit’ (cf. infinitives Lith. sésti and OCS 
sésti), from PIE *sed-. Albanian seems to similarly show this development, 
in forms such as rronj ‘endure’ < *rég-n- (with o regularly from earlier *e; for 
the root, cf. Gr. ópéyo 'extend") or erë ‘smell’ < *öd-r- (PIE *hzed-, cf. Lat. 
odor), although this may alternatively reflect compensatory lengthening with 
the loss of the stop (Hyllested 2013). 


13.4.1.3 Lexical Isoglosses Several scholars have noted sizeable lexical 
overlap between Balto-Slavic and Albanian. Orel (1998: 250-6) counts twenty- 
four shared items, deeming this group of isoglosses the “most important and 
significant” one. As many as forty-eight words are allegedly shared between 
Albanian and Baltic only, leading Orel to call this connection “particularly 
close", while he further lists twenty-two terms shared just by Albanian and 
Slavic (“not as frequent as Baltic ones”). 

However, not all of these etymologies appear equally convincing. For 
example, Alb. bac ‘elder brother; uncle’ must be borrowed from Slav. *bat’a 
‘elder brother; father’, not cognate with it (Hyllested 2020: 402); Alb. shtrep, 
shtrebé ‘cheese-fly larva’, rather than being related to Slav. *strup» ‘scab’, 
belongs with Gr. ozpégo ‘twist’, as is not least apparent from its inner-Albanian 
cognate shtrembet ‘be crooked’ (Hyllested 2016: 75); and Alb. murg ‘dark, 
grey’ ~ Lith. márgas ‘colorful’ do not constitute an isogloss but are clearly 
related to both PGmc. *murkaz ‘dark’, Gr. åuoppóç ‘dark’ and Slav. *mergo 
‘brown’. 
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Crucially, the more promising of these comparanda are, in most cases, 
morphologically and/or semantically more distant from each other than the 
proposed Helleno-Albanian isoglosses. Alb. brez ‘belt’ vs. Lith. briauna ‘edge’ 
is a typical example: these two words undoubtedly contain the same IE root but 
with markedly different word-formation and meanings that differ significantly. 
Thus, while the item is useful in a general comparative analysis, it is less so as 
evidence for subgrouping. A systematic analysis of all relevant forms goes 
beyond our scope, but one can fairly say that the number of closely knit 
lexemes with strong etymologies is in fact not significantly higher between 
Albanian and Balto-Slavic than one would expect between any two IE 
branches. 


13.4.2 Albanian and Armenian 


Considering the large number of shared innovations between Albanian and 
Greek on the one hand (Section 13.4.7) and between Greek and Armenian on 
the other (Section 12.4.1), it is perhaps surprising how few can be found 
between Albanian and Armenian only. This does not speak against a Palaeo- 
Balkanic subgroup encompassing all three since it may simply reflect the fact 
that Greek preserves so much more IE lexical material, including Balkanic 
innovations, than the other two.” Most famous among the relevant isoglosses is 
Alb. zog ‘bird; nestling; (dial.) animal young’ ~ Arm. jag ‘little bird, sparrow; 
nestling’, as if from a protoform *$^uag"u- (Jokl 1963: 152; Olsen 1999: 110— 
11); however, it may constitute a shared retention since its root etymology is 
unknown. 

A shared inflectional feature is the new masculine *smi-i-o- for the numeral 
‘one’, Alb. njé and Arm. mi, based on the Balkanic feminine *smi-i-a with 
breaking from PIE *sm-ih; as in Gr. ua (Klingenschmitt n.d.: 22). 

In derivational morphology, Armenian and Albanian share a productive 
agent-noun suffix *-ik"jo- > Arm. -ic', Alb. -és (Matzinger 2016: 167; 
Thorsø 2019: 252), which we see as derived from PIE *k"ei- ‘gather’ (cf. Gr. 
zoiéc *make"). 

One phonological development shared by Albanian and Armenian is loss of 
*m in the cluster *-ms-, cf. Alb. mish ‘meat’ ~ Arm. mis ‘id.’ < PIE *mems-o-; 
Arm. ows ‘shoulder’ vs. Gr. óuoc ‘shoulder’ < PIE *h,ömsos. This must 
however reflect two parallel developments if, as we argue, Albanian and 
Greek (or, for that matter, Armenian and Greek) form a subgroup within 
Balkanic, since Greek preserves the *-m-. 

Other joint phonological features relate to centum-satem behavior and are 
mostly systematically parallel, not necessarily substantially identical. First and 


5 See Section 13.4.8 on innovations shared by the entire proposed Balkan group. 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


13 Albanian 229 


foremost, like Albanian, Armenian keeps a three-way distinction of PIE dorsals 
(see Section 13.5.1). But both languages also have a development of PIE *ku- 
and *g’u-, which, like everywhere in the satem area proper, is different from 
both that of the palatals and that of labiovelars but at the same time, unlike 
Indo-Iranian and Balto-Slavic, shows no direct trace of the semivowel; e.g. 
Alb. ze, def. zëri (Geg za, záni) ‘voice’, Arm. jayn ‘voice, sound’ ~ OCS zvon» 
‘noise’ < PIE *S"uönos. 


13.4.3 Albanian and Celtic 


Few traits, almost exclusively lexical in nature, link Albanian specifically with 
Celtic. A quite optimistic pioneering collection of isoglosses by Jokl 1927 was 
subjected to critical scrutiny by Cabej 1969, who effectively disqualified much 
of the evidence. Most famous is the similarity between Alb. gju ‘knee’, S Tosk 
glu, Geg gjü, def. gjuni, ~ PCelt. *glünos ‘knee’ (Olr. glún, Welsh glin), 
apparently involving a new stem-form *gnu-n- from PIE *génu with subse- 
quent dissimilation to *glu-n-. 

The remaining evidence amounts to nothing more than what would be 
expected statistically; Orel (2000) mentions only six items. Moreover, the 
picture is somewhat blurred by the fact that many apparent shared lexemes 
are likely early Celtic borrowings into Proto- Albanian from when Celtic tribes 
such as the Serdi and the Scordisci settled in the Balkans in the third century 
BCE. This may, e.g., be the case with Alb. shqipe ‘eagle’, which, like Welsh 
ysglyf ‘eagle’, is derivable from a proto-form *sklubo-, metathesized from 
earlier *skublo- from which the other attested Celtic forms developed 
(Hyllested 2016: 76—7). 


13.4.4 Albanian and Germanic 


Ringe, Warnow, and Taylor (2002), in a statistical-quantitative analysis of the 
IE lexicon, reached the apparent result of an Albanian subgroup with 
Germanic, the significance of which the authors themselves downplayed, and 
with good reason: the absolute number of lexical cognates shared by these 
branches only is relatively moderate. Orel 1998: 253-4 lists just thirteen, not all 
with equally valid etymologies; for example, tym ‘smoke’ must be borrowed 
from Gr. Oöuos (with an older meaning than the attested ‘anger’), rather than 
related to PGmc. *édumaz “breath’.° Moreover, the lexical isoglosses are not 
corroborated by many shared grammatical elements or features. 


One oft-mentioned item is Alb. det ‘sea’, Arbéresh dej(é)t, usually etymologized as PAlb. 
*deubeta, corresponding to PGmc. *deupipo- ‘depth’. Hyllested (2016: 71 n. 12) instead 
suggests it could be a borrowing from Gr. dédta ‘river delta’. At least two other Albanian 
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There are nonetheless some remarkable cases of shared word-formation. 
One recently published etymology is hunde ‘nose’ < PAlb. *skunta ~ Far., 
SWNw. skon ‘snout’ < PGmc. *skuna- (Hyllested 2012). Alb. delme ‘sheep’ is 
only a metathesis away from corresponding regularly to Dalecarlian tembel 
‘sheep’ < PGme. *tamila-, a derivative of PGmc. *tamjan ‘to tame’ < PIE 
*demH-; treating the nasal rather than the lateral as original to the Albanian root 
is supported by the synchronically suppletive plural dhéndé < *domH-it-eh», 
literally ‘the tamed (collective of animals)’. 


13.4.5 Albanian and Italic 


As stated by Huld (1984: 168): “Relations between Albanian and Italic are 
largely negligible”. Most prominent among the vanishingly few shared innov- 
ations is the lexical pair Alb. bir ‘son’, bijé ‘daughter’ (as well as Mess. bilia 
‘daughter’), which is likely identical to Lat. filius, filia, respectively (Hyllested 
2020: 421-2). Albanian hi, Geg hi, def. hini ‘ashes’ < *sken-is- seems to agree 
in ablaut with Lat. cinis ‘cold ashes’ < *ken-is- vs. Gr. xóvic ‘dust; ashes’ and 
Toch.B kentse ‘rust’ < *koniso-, but both forms are probably old in IE, and the 
equation with Albanian is far from certain anyway (Hyllested 2012: 76 n. 4). 


13.4.6 Albanian and Indo-Iranian 


Jokl (1963: 152), in his somewhat inconclusive posthumous work, listed eight 
lexical parallels between Albanian and Indo-Iranian, almost none of which, 
however, constitute exclusive isoglosses, as Jokl himself acknowledged. Even 
his flagship first item, Alb. dhender(r), Geg dhänder(r) ‘son-in-law; bride- 
groom’, which on the surface looks like the same *-ter formation from PIE 
*gem(H)- as Ved. jamätar-, YAv. zämätar- ‘son-in-law’, may simply owe its 
-d- to inner-Albanian epenthesis as in the rhyming word éndér(r) ‘dream’ < PIE 
* Hon-r-io-, while Indo-Iranian *-tar can be analogical from other kinship 
terms. In that case, Albanian formally agrees with Lat. gener and Gr. 
yaufipóc instead." 

Orel's (2000: 260) more recent list of ten items suffers from the same 
conspicuous weaknesses; for example, Alb. thadér ‘double-sided axe’ does 
not actually form a unique isogloss with Ved. Br.+ sastra- ‘knife; sword’, since 
Lat. castrum ‘knife’ represents an identical formation < PIE *kas-trom, lit. 


words from the same semantic field are Greek borrowings: pellg ‘pond; basin; depth’ = zéAayoc 
‘sea’ and zall ‘riverbank, river sand’ = aiyıalög ‘sea-shore’. 

The irregular and unparallelled plural dhénduré, North Geg dhándórré is probably due to later 
conflation with Lat. genitöres ‘begetters’ (i.e., of heirs, cf. Eng. beget an heir), where the 
significant position of the plural must be seen in the light of traditional Balkan household 
structures with several married couples under one roof. 


x 
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*cutting-instrument'. A critical assessment of some further oft-mentioned items 
is provided by Huld (1984: 167). 


13.4.7 Albanian and Greek 


As noted above, our ultimate assessment treats Albanian and Greek as particu- 
larly close relatives within Indo-European. We find the number of innovations 
shared only by Albanian and Greek to be overwhelming, thus pointing com- 
pellingly to a Helleno-Albanian subgroup. In this section, we offer an overview 
of shared developments, without claiming exhaustiveness. The evidence is 
mostly morphological and lexical in nature, involving particular lexical items 
or details of word-formation, but there are also several phonological 
commonalities." 


13.4.7.1 Phonology 


1. Initial *;- has a twofold reflex in both languages: (a) an obstruent *dz- > Alb. 
gj-, Gr. &-, which already appears in Mycenaean, vs. (b) a preserved *j- > 
Alb. j-, PGr. *j-, which later yielded A- in early Greek, but is still partially 
retained in Mycenaean. For Greek, the conditioning is famously disputed." 
Despite the fact that a similar double reflex between j- and gj- has long been 
recognized in Albanian,” it has hitherto gone unnoticed that the distribution 
between individual lexemes is identical in both languages: Alb. n-gjesh 
‘knead’ (< *iós-(i)ie-) ~ Céc ‘boil, seethe’ < *ies- ‘boil; ferment’; Alb. gjesh 
‘gird’ ~ Gr. (ovvoui ‘id.’ < PIE *ieh3s-; Arbëresh gjër ‘soup’, Geg gjáné 
‘silt, mudbed’ < *iouh3-(m)n-o- ~ Gr. Coun ‘sourdough’, (wuög ‘sauce; 
broth’ < *ieuhs-s- ‘mix sth. moist’; vs. Alb. ju “you (2pl.)’ ~ Gr. üuelc ‘id.’ 
(although the latter may instead continue PIE acc. *us-me); Alb. a-jo ‘she’ ~ 
Gr. rel. pron. f. 7 < *ieh2; and Alb. josh ‘fondle, caress’ < *ieud"-s- (cf. for 


oo 


Space does not allow a word-by-word treatment of purported isoglosses whose validity for 
various reasons we reject. A few examples may illustrate: Alb. eger ‘wild’ must be borrowed 
from Gr. &ypioc ‘id.’, not a cognate, since the PIE root has *-g-, which yields Alb. dh. The 
singularized plural dhemje ‘caterpillar; maggot’ is unrelated to Gr. óeueAéac ‘leech’; the variant 
vemje shows it is instead a borrowing from the Slavic collective noun *vormpje ‘insects and 
worms’ with regular development of v- > dh- / VCC where one consonant is a labial. And while 
Alb. derr ‘pig’ ~ Gr. yoipoc ‘boar’ clearly point to a common protoform *2"ór-io-s, this is likely 
not a Helleno-Albanian innovation since Finn. karjas *wild boar' suggests a loan from an 
otherwise unattested Proto-Germanic counterpart *garjaz (Hyllested 2020: 412 n. 26). 

It is likely that the distribution is based on the presence vs. non-presence of laryngeals, as 
proposed for Greek by Peters (1976): *i- > C- vs. *Hi- > '-; however, other scholars see exactly 
the reverse distribution here (e.g. LIV?). Either way, it is significant that Greek and Albanian 
agree on which lexemes show which reflexes. 

See Kortlandt 1996 for a summary of the various scholarly views regarding the Albanian 
material. 


o 
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the meaning Lith. jauda 'seduction") ~ Gr. voufvn ‘battle’ < *iud'-s- < 
*ieud"- ‘care for, be engaged in’. 

2. In both Albanian and Greek, the original clusters *zi and *di underwent 
affrication to */s and *dz, and in initial position, the former further assibi- 
lated into *s-. In Albanian, assibilation was ultimately completed in all 
positions, resulting in s and z, a development which happened late enough 
to affect Latin loanwords. The only relevant lexemes shared by both lan- 
guages involve the voiced cluster: Alb. Zoj-z ‘Albanian sky god’ - Gr. Zeóc 
« *diéus (Mann 1952: 32) and Alb. dhjes ‘to shit’ (with secondary final 
devoicing) ~ Gr. yéC ‘id.’ < *$^ed-ie/o-. 

3. PIE thorn clusters with a labiovelar retain the rounding (Section 13.5.1). 
While this is in itself an archaism, scholars who do not believe in the Core IE 
thorn-cluster metathesis will see a clear shared innovation here. 

4. Thetwo languages share many developments of clusters containing sonants. 
For example, *-s- was lost with compensatory lengthening before a sonant, 
e.g. Alb. doré ‘hand’ < *g'era < *g'és-ra ~ Gr. yeip ‘id.’ < PIE *gtes-r and 
Alb. krua ‘spring’ m., pl. kronj ~ Gr. xprjvn, Dor. xpávà ‘spring, well’ < 
*kras-neh» ~ PGmc. *hrazno ‘wave’ (> OE hern, ON hronn). 


13.4.7.2 Inflection and Morphosyntax 


]. Under the assumption of a set of distinct past tense middle voice endings in 
PIE, as suggested by parallels between, e.g., Greek and Sanskrit, e.g. 3sg. -to ~ 
-ta, lpl. -ue0a ~ -mahi, 3pl. -ovto ~ -anta, it is interesting that both Greek and 
Albanian have formations with specifically active past endings in a non-active 
past paradigm. That is, in the aorist passive, as opposed to middle forms with 
the endings given above (-7o, etc.), Greek adds active endings to the passive 
stem, e.g. 1sg. éxA0n-v ‘I-was washed’ / 2sg. émió0g-c *you-were washed’, 
etc. (for the endings, cf. active imperfect 1sg. &rivvo-v ‘I-was washing’ / 2sg. 
énAvve-g *you-were washing’); similarly, Albanian uses active forms with the 
formative u (based on the PIE reflexive element *sue), e.g. u lava ‘I-was 
washed’ / u lave *you-were washed’ (for the endings, cf. active past lava 
‘I-washed’ / lave *you-washed"). These past forms with active endings are in 
addition, in both languages, to inherited special present medio-passive endings 
(e.g. 1/2/3sg. Gr. -uai/-oai/-tar, Alb. -m/-sh/-t). It thus appears that both have 
innovated to use ostensibly active endings in a past passive formation. 

2. As pointed out in footnote 3, both Albanian and Greek show the inherited use of 
the negator *meh; in prohibitives. Additionally, though, both also show innova- 
tive uses of *meh, not found elsewhere in IE. Specifically (cf. Joseph 2013), 
uses of *meh; in negating non-finite forms (e.g. Alb. per té mos dështuar “(in 
order) to not fail’, Gr. tò un npouaßerv ` (the-state-of) not knowing beforehand"), 
in tentative questions (e.g. Alb. mos e njihni? ‘do you perhaps know him?’, 
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Gr. un 001 ĝokoðuev ‘do we perhaps seem to-you . .. ?’), and in introducing 
‘fear’ complements (Alb. kam frikë mos e kam infektuar ‘I-have fear lest 
I-have infected him’, Gr. ó&óoike un 01 90ap0 *he-feared lest I-be-corrupted") 
are all functional innovations found exclusively in Albanian and Greek. 


13.4.7.3 Verb Formation 


1. One of the most characteristic innovations shared by Albanian and Greek is 
a group of new productive verbal present types combining a nasal present and 
a i-present. They sometimes build on old nasal presents such as */neub'-n-i- > 
Alb. venj ‘weave’, Gr. ógaívo ‘weave’ (Porzig 1954: 178; cf. Ved. ubhnäti), 
sometimes not (see Section 13.4.8 on *b'e/,- ‘shine’ > Alb. bëj ‘does’, Gr. 
paivouaı “appear’). They may even be denominal, as with Alb. thaj, Arbëresh 
thanj ‘dry up’ ~ Gr. avatve < *saus-n-i-, denominative to *saus-o- ‘dry’ (Gr. 
adog). 

2. Relatedly, both languages often create simple secondary i-presents for verbs 
with roots ending in a sonant; they share at least three such verbs: 

a. PIE *ten- ‘to stretch": nu-present *tn-néu- (cf. Ved. tanoti) — *ten-ie- in 
Alb. n-de(n)j and Gr. teiva 

b. PIE *der- ‘tear apart’: thematic present *der-e- — *der-ie- in Alb. djerr 
‘destroy’ ~ óeípo (alongside dgpm) ‘to skin, flay’ (pace Orel 1998: 69 
and LIV? 119-20) 

c. PIE *d'g""er- ‘flow; diverge, perish’: thematic present *d"g"er-e- > 
*owhber-i- (cf. Section 13.5.1 and compare Ved. ksarati ‘flow; wane, 
perish’, Av. yzaraiti ‘flow’). 

3. As mentioned in Chapter 12, a new type of s-aorist arose in the broader 
Balkanic subgroup already, formed with *-eh>-s- to denominative verbs in 
*-eh;-ie-. By analogy, Albanian and Greek agree on forming an s-aorist to 
the PIE root *deh;i- ‘share, divide’, cf. Alb. (n-)dava, Gr. gdaicdpny ‘I 
shared’ vs. the old root aorist in Ved. (ava) adat ‘split off" (LIV? 103-4). 

4. The OAlb. 3sg.aor. u n-gre ‘arose’ reflects the same innovated thematic 
aorist */;gr-e/o- as Homeric Gr. éypeto ‘woke up’, to the root *h,ger- ‘wake 
up’, replacing an original athematic aorist (Schumacher 2017). 

5. Several verbs co-occur with *peri- ‘around’ in both languages: 

a. *peri-k"|-n-h;- ‘turn around’ > Alb. pér-kul ‘to bend, curve’ ~ Gr. zepi- 
télouar ‘go in circles’ (LIV? 386)'! 

b. *peri-seh;g- lit. ‘drive around’, lexicalized as pér-gjoj ‘listen closely; 
eavesdrop’ ~ Gr. zepı-nyeouoı “explain, describe’ (alongside ‘lead around") 


! Although the context in which Olr. do-air-chella ‘conceals’ is attested also allows for 
a translation ‘encloses (of water)’, ar-cela alone means ‘takes away, steals’, and it rather 
contains the PIE root *kel- in celim ‘hides’ (Edel 2006: 83 n. 46; Le Mair 2011). 
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c. *peri-pek"- ‘bake all over’, lexicalized as ‘crust over’ > Alb. noun 
pér-peq ‘colostrum pudding’, secondary from the pl. of *pér-pak ~ Gr. 
mepi-méoow metaph. ‘gloss over, cajole’. 

6. The Albanian copula is prefixed with *h;en-: Geg äsht ~ Tosk është ‘is’ < 

*hien-h;esti corresponding to Gr. éveoti ‘is in’ alongside short forms in Tosk 

ë and Koine év1 (cf. Hamp 1980; Joseph 2016). 


13.4.7.4 Nominal Formation 


1. Across IE, for deriving adjectives from *sal- ‘salt’, various suffixes are 
found, e.g. *-iko- in Germanic (e.g. NHG salz-ig), *-no- in Slavic (e.g. Ru. 
sol-én-yj), but both Albanian and Greek show parallel formations with an 
*-m- suffix alone or together with *-i-: Alb. n-gjel-m-ét ‘salty’ ~ Gr. äAıuog 
‘of the sea’, &4-u-vpóc ‘briny’. 

2. Based on the need for *a or *é in the preform of Albanian sot ‘today’, in 
order to motivate the o-vocalism, Joseph (2013) posits a pre-Albanian 
adverbial composed of a deictic element *ki with *amer for ‘day’, 
*kj-ämer-, ‘this day’; later, after a metanalysis to *kjä-mer-, the more 
usual word for ‘day’, *diti-, replaced *(ä)mer, giving *kja-diti, from 
which sot developed regularly. This lexeme occurs also in Greek (cf. 
"nuop, Huépa) and Armenian (awr), so its presumed occurrence here may 
link Albanian, Greek, and Armenian, but the use of this form in the word 
for ‘today’ specifically links Albanian and Greek, since Greek has onuepov 
(Attic tyuepov) < *kj-ämer-o-m.'” 

3. Alb. bot ‘someone; person’, boté ‘world; humanity; others’ <a concretized 
acrostatic f-stem noun *b’ueh>-t- ‘living being’ < abstract ‘becoming’ ~ 
*b^ueh;-t-éh», collective of *b^ueh;-t-ó- ‘having life’, respectively ~ Gr. 
Pos, gen. pwtóç ‘man; mortal’ < *b^yoh;-t- (Kashima 2019). 

4. Alb. huaj ‘stranger (sb.); foreign, alien (adj.)’ formally matches Gr. čévıoç, 
an epithet of Zeus derived from Cévoc, Dor. Gévroc, Ion. ceivog (‘id.’; 
Porzig 1954: 178). The protoform *ksenuo- < *gls-en-uo- contains the 
same root as NW IE *g'ós-ti-s ‘guest; host’. The lengthening in Albanian 
(-ua- < *-0- < *-e-) is compensatory from the loss of *-u- (< *ksenja- < 
*ksennja- « *ksenujo-; Hyllested 2013). 

5. A new term *g’ersos ‘dry land, fallow land’ from the root *$'ers- ‘stiff? > 
Alb. djerr ~ Gr. yepoóc (curiously reminiscent of Italo-Celtic *tersos ‘id.’ 
from the root *ters- ‘dry’). 


12 Tt is tempting to see the metanalysis to *kja- as a shared Albanian-Greek feature, since Greek 
shows the same development; cf. Mycenaean za-we-te ‘this year’, from *kja-wetes (note later 
antec, Attic Titec). 
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6. A derivative *spor-eh; ‘seed; semen’ from the root *sper- ‘spread, strew’ > 
Alb. farë ~ Gr. onopd.'” 

7. A result noun *$"ud-tlo- from the root *g^eud- ‘pour’: Alb. dyllë ‘wax; 
sap’, Gr. yoAóc ‘juice’ (Porzig 1954: 178; Huld 1984: 165). The lengthen- 
ing reflected in Alb. -y- is compensatory from the loss of *-d(s)t-, nota sign 
of Winter's Law in Albanian (cf. Section 13.4.1). 

8. An instrument noun *kemt-trom ‘stinger’ > Alb. thundér ‘hoof’ (with -un- 
from *-em- as in tundoj ‘tempt’ = Lat. tempto; same root as in Alb. thua 
‘nail’ and thumb *bee's stinger; thorn; arrowhead point’) ~ Gr. xévrpov 
‘point, goad; nail’ (borrowed into Geg as cándér, qándér ‘forked shoring 
pole; prop’). 

9. A derivative *h30d-mehp ‘smell’ > Tosk améz, Geg amë ‘scent; flavour’ ~ 
Gr. ööun ‘stench’ vs. Lat. odor ‘smell’, Arm. hot ‘smell; savour’ (Huld 
1984: 165). 

10. Hamp (2015: 15) found a common collocation in Alb. bie eré ‘smell’ < 
*b'er- + *hzod-r-eh vs. Gr. Ooppaivouaoı ‘to smell’ < *h3od-s- + b'er- lit. 
‘carry odour’. 

11. The name of the Albanian dawn-goddess, goddess of love and protector of 
women, Premté, P(é)rende corresponds regularly to the Greek name 
Tlepoégatta, a variant of //epoepovn, which Janda (2000: 224-50) convin- 
cingly traces back to *pers-é-b'(h))nt-ih; ‘she who brings the light 
through’. The development of -b’nC- would be the same as in venj 
‘weave’ < *vemj- < *h;eub'-ni- (cf. Section 13.4.7.1 (1)); regarding Alb. 
-r- from originally pretonic -rs-, cf. ter ‘to dry’ from the PIE causative 
*tors-éie-. 

12. In both Albanian and Greek, two PIE u-stems, *gén-u 'knee' and 
*dor-u ‘tree’, occur with -n-extensions: Alb. gju ‘knee’, Geg gjü, def. 
gjüni (cf. Section 13.4.3) and dru ‘tree’, Geg dri, def. drûni ~ Gr. yovatov 
(alongside original yóvo) and óóp(r)axoc (Huld 1984: 165). 

13. PIE */end"os ‘meadow vegetation’ acquired the meaning ‘flower’ in both 
Alb. ende and Gr. &vOoc vs. Arm. and ‘field’, Ved. ándha- ‘herb’, Toch.B 
ant A änte ‘plain’ (Huld 1984: 164; Kortlandt 1986: 39). From this noun, 
a new verb *(h;)and'-éie- was derived, yielding Alb. éndem, Gr. dvSéa 
‘blossoms’. Formally, they correspond to Arm. andem ‘cultivate’ (Danka 
& Witczak 1995: 124), but the meaning differences suggest that the 
Armenian derivation happened independently. 

14. The Albanian o-grade derivative darké ‘supper, dinner; evening’ matches 
Gr. óópzov ‘evening meal’ < *dórk"om (Porzig 1954: 178; Jokl 1963: 


13 Alb. farë meaning ‘affinity; kind’ is historically a different word, borrowed from Langobardic 
fara “military clan’ into almost all Balkan languages, including Romanian, Bulgarian, and 
Modern Greek. 
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154); the root is not isolated if akin to Bret. dibri, dribi ‘eat’ (per Hamp 
1966). 


. It has long been known that Alb. pér-pjeté ‘steep’, prep./adv. ‘upwards’, 


noun f. ‘hill, slope’, from *pro-peth;-o- corresponds accurately to Gr. 
zponermns ‘falling forwards’, containing the root of zerouaı ‘fly’ (Orel 
1998: 321 with references). But it has gone unnoticed that the phrase 
underlying the counterpart taté-pjeté *slope; (adv.) downhill" (Orel 1998: 
450) also occurs in Gr. xara-zizco ‘fall down’. 


. If Nikolaev (2009: 195) has correctly derived Arm. /earn ‘mountain’ and 


Olr. lie ‘stone’ from */eh;u-r, *lehzu-n-, then Albanian and Greek agree on 
a secondary thematic derivative *leh>u-r-eh> ‘rockfall’ > Alb. lerë *boul- 
der; stone heap’ ~ Gr. (Attic) Aavpa, Ep. Ion. Aabpy ‘narrow passage, alley’ 
(so too Jokl 1934: 46-8).'* 


. Albanian and Greek agree on a -no-derivative *kuap-no-s ‘smoke’ > Gr. 


kazxvóc ‘smoke’, Alb. kem ‘incense’ vs. other derivatives in Lat. vapor 
‘steam’, Lith. Avapas ‘breath; smell’ (Porzig 1954: 177). 


. An -i- in the stem of *kouH-(i-)lo- *hollow; empty’ is reflected only in Alb. 


thellë ‘deep; dark(-coloured)’, Gr. xoiloc, xóiAoc, Myc. ko-wi-ro ‘hollow’ 
(Porzig 1954: 177; differently Huld 1978). 


. PIE *g"e/H- ‘torment, sting’ in words for ‘sewing needle’ > Alb. glep, 


gjep, gjilpërë, Geg gjvlpáné ~ Gr. PeAovn (Irslinger 2017: 312). The 
Albanian suffix -éré, -dné even formally matches Gr. -6vy < *-mn-eh; 
(Olsen 1999: 492; Rasmussen 1996: 154), used in denotations for instru- 
ments and remedies. 

Alb. bar n., pl. barëra, Geg barna ‘grass; medicine’ ~ Gr. papuakov ‘drug, 
medicine’ < *b/ar-(m)n- (Jokl 1963: 129), derived from the Core IE root 
*bgr- which denotes crops everywhere else (e.g. Lat. far ‘spelt’, Eng. 
barley). 

Alb. ndér-dym ‘in doubt’ formally corresponds to Gr. dic ‘apart, through’ 
< *duis-m “in two (parts)’ (Mann 1952: 32). 

A pronoun */zauto- ‘self’ occurs in Alb. vetë, Gr. aùtóç (Witczak 1997: 
216); also shared with Phrygian (avtos; see Section 11.4.2). 


13.4.7.5 Semantic Innovations (Selection) 


1. PIE *seh>g- ‘seek’ — ‘drive’: Alb. gjuaj ‘drive (quickly), chase’, Gr. 


ńyéouar ‘lead the way, guide’ (cf. Section 13.4.7.3 (5b)). 


^ Milyan lakre is formally identical to the Helleno-Albanian word, but possibly means ‘stone 
tablet’ (Nikolaev 2009: 196). z 

15 Arm. soyl ‘cave’ appears to be a ghost form and would reflect *kouH-lo- anyway (Zair 2011: 
166 n. 5). 
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2. *lóg'-o- ‘resting-place’ (Slavic */ogd ‘lair’, Toch. B leke *bed") — ‘camp’ 
— ‘troop, band’ in Alb. dial. lag, Gr. Adyo¢ (Hyllested 2020: 410-11). 

3. bhih;-mn ‘growth’ (Ved. bhiiman *world, region (n.); multitude, wealth' 
(m.)) — ‘plant’ in Alb. bimë, Gr. põua (Mann 1950: 387). 

4. *h>end'os ‘meadow vegetation’ — ‘flower’ (Section 13.4.7.4 (13)).'° 

5. *hjerg'- ‘go; jump up’ — ‘come’ in Alb. erdh- aor., Gr. &pyouoı. 

6. *h;éh,tr ‘stomach; intestines’ (PGmc. *epro ‘veins, entrails’, e.g. > OE 
cedre, also ‘sinew; kidney’) — ‘heart’: in Alb. voter, vater", Gr. "top. 

7. *kras-neh, ‘wave’ (Section 13.4.7.1 (4)) — ‘spring, well’ in Alb. Arua, Gr. 
kpnvn; compare Eng. well ~ NHG Welle ‘wave’, Lith. vilnis, Ru. volná ‘id.’ 


13.4.8 | A Palaeo-Balkanic Group? 


Evidence for a broader Balkanic group consisting of Albanian, Greek, and 
Armenian, as well as Phrygian, is presented in Section 12.4.1 and (mainly) 
Section 12.5.'* To this we can add 

1. A new possessive pronoun *emos ‘mine’ > Alb. im(e), Gr. éuóg, Arm. im, 
perhaps dissimilated from an old accusative me-me (Huld 1984: 165 with 
references). 

2. A suppletive aorist *e"er/- to the verb ‘eat’, irrespective of the origin of the 
present stem. Compare Alb. ha, aor. n-gré; Gr. &öw, EoOiw, aor. &payov, é- 
Ppo®-Onv; Arm. owt'em, aor. k'er- (Holst 2009: 87). 

3. By the same analogy described in Section 13.4.7 Verb Formation (3), the old 
root aorist of PIE *steh>- ‘stand’ was replaced with an s-aorist *steh>-s- with 
factitive semantics in both Alb. shtova ‘added’, Gr. &otnoa. ‘made stand’, Arm. 
stac 'ay ‘acquired’, Phryg. estaes ‘erected’, and Mess. stahan ‘erected’ (Søborg 
2020: 76). 

4. A new root *klau- ‘to cry’ > Alb. qaj, OAlb. klanj < *klau-ni- ~ Gr. xAaío, 
Arm. Jam ‘to cry’. 

5. The originally honorific term */5ner- ‘man (of consequence)’ has replaced 
*uiHró- as the common word for ‘man’, Alb. njeri, Gr. &vijp, Arm. ayr 
(Huld 1984: 165). 

6. Generalized full-grade in the word for ‘louse egg’: Alb. théri, Geg théni < 
*konid-, Gr. kovíc and Arm. anic (dissimilated from *sanic) vs. zero-grade 


Changes in specific plant-names (e.g. Alb. ah ‘beech’ ~ Gr. déva ‘id.’ vs. ‘ashtree’ elsewhere) 
are not included here as they may reflect new geographical surroundings rather than genealogy. 
Synchronically identical to votér, vatér ‘fireplace, hearth’ (understood as the middle of the 
house) due to merger with PIE */zeh)-tr ‘id.’. 

We can embrace most of the evidence adduced there although we note that (1) Alb. edh ‘goat’ 
may simply be borrowed from Lat. haedus, cf. Rom. ied (Witczak 1997: 125); (2) the locative 
plural ending *-si is not secured for Albanian since even *-su may yield the attested outcome -sh; 
and (3) awr ‘day’ etc. was probably not originally restricted to Greek and Armenian (Section 
13.4.7.4 (2)). On Alb. grua *woman', see also Opfermann (2017). 
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*knidà in Germanic and Balto-Slavic: OE Anitu, Latv. gnida, SCr. gnjida 
(Huld 1984: 165). 

7. *ster-ih; ‘sterile (of females) > Alb. shtjerré, Gr. oteipa, Arm. sterj 
(Hyllested 2016; on the Greek-Armenian connection see Lamberterie 2013). 

8. Perhaps PIE *k"ei- ‘gather’ > Palaeo-Balkanic ‘make’ (Section 13.4.2). 

There is also some evidence for a broader Balkanic unity wherein further 

developments set Albanian and Greek apart from Armenian, again pointing 

to a Helleno-Albanian subgroup: 
9. PIE *b’eh- ‘shine’ (LIV? 68-9) forms a nasal present in Albanian, Greek and 
Armenian, but only Albanian and Greek add an extra i-present to it, following 
a productive pattern (Section 13.4.7.3 (1)): Armenian banam < *b"eh;-n- vs. 
Alb. bëj, Geg báj ‘does’, Greek paivouaı ‘appear’ < *b^eh;-n-i-. 

10. A derivative *Hon-r-io- (alongside archaic *Hon-r) ‘dream’ occurs in Alb. 
ëndërr ~ ëndër and Gr. óveipog ~ óvap vs. Arm. anowrj (< *Hnor-io-), all 
‘dream’ (Lamberterie 2013; Kortlandt 1986: 38; Witczak 1997: 126). Its 
root is not found elsewhere, but the heteroclitic declension points to an IE 
retention in Palaeo-Balkanic. 

11. A derivative *h,ed-un-eh; ‘pain’ > Alb. dhunë, dhuré f.pl. ‘damage, injury; 
shame, disgrace’ = Gr. óóóvr ‘pain’ alongside the older *h,ed-uön- in Arm. 
erkn ‘labour pains’ and e.g. secondary *hred-ön in Olr. idu (not *-uön- 
since *-du- > Olr. -db-). 

And in one case, an Armenian innovation isolates it from a Helleno-Albanian 

remainder: 

12. The word for ‘bee’ is derived from *mel-it ‘honey’ in all three languages 
(Holst 2009: 90): Alb. mjaltë ‘honey’ ~ bletë, mjalcé ‘bee’, Gr. u£A: ‘honey’ ~ 
Léňa, ue)ırra ‘bee’, Arm. metr, -ow ‘bee’ ~ mefow ‘honey’, but Armenian 
has -u- by influence from PIE *méd^u ‘mead’ (Clackson 2017: 112). 


13.5 The Position of Albanian 


13.5.1 Broader Connections within IE: Albanian and the Centum-Satem 
Division 

Starting with reconstructed PIE with a three-way distinction in the guttural 
consonants (palatals, e.g. *k velars, e.g. *k, and labiovelars, e.g. *k"), 
a division within IE is possible, descriptively, into branches that merge 
palatals and velars (so-called centum languages) and those merging velars 
and labiovelars (satem languages). The satem languages also show affrica- 
tion and/or assibilation of the PIE palatals. We say “descriptively” because 
we do not see this division as a basically genealogical one within IE. For us, 
the centum languages are not a coherent dialectal or genealogical subgroup 
though the satem languages might be. The position of Albanian within this 
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scheme is thus of considerable interest and, not surprisingly, is somewhat 
complicated. 

In particular, while Albanian shows some merger of labiovelar and velar, e.g. 
pjek ‘to cook’ < *pek"- (cf. Gr. nenov ‘ripe’) and plak ‘old man’ < *plo»k- (cf. Lith. 
pilkas “grey’), it also maintains the original three-way guttural distinction in some 
environments, and thus descriptively is neither centum nor satem. As recognized by 
Pedersen 1900, they all show distinct outcomes before original front vowels, e.g. 
tho-té ‘says’ < *ke-ti < *keh,-ti (cf. Old Persian 9a-tiy), kohë ‘time’ < *késko- (cf. 
OCS čas» ‘hour’), and sorrë ‘crow’ < *k"ersno- (a vrddhi derivative of ‘black’, cf. 
Sanskrit krsna-). In this way, Albanian behaves like Luvian, as analyzed by 
Melchert 1987. Moreover, since elsewhere in Anatolian, centum-like mergers 
happened independently (e.g. Hitt. ki-ta ‘lies’ < *kei-, cf. Ved. Br.+ sé-te), centum- 
ness cannot be considered a significant innovation. In fact, centum-ness seems 
relevant only for post-Anatolian and post-Tocharian IE, and really equates to just 
Italo-Celtic and Germanic; satem-ness, by contrast, equates to Balto-Slavic and 
Indo-Iranian (and could be a real shared innovation between them). An ancient 
Balkan group, including Armenian, Albanian, and Greek, appears like a potpourri, 
making up a third unit which initially kept all original stop distinctions; various 
developments in its individual sub-branches subsequently obscured this basic 
retention, e.g. the Albanian *K/*k" merger in some environments noted above, or 
the assibilation seen in sjell ‘bring’ < *k"el- (cf. Gr. ne/o “be in motion"). 

Albanian lexemes with initial clusters vd- and ft- are of special interest in this 
context. Previous etymologies of the two clearest examples, Alb. vdjerr ‘to 
disappear’ and vdes ‘to die’ (stem vdek- as in the participle vdekur “dead’), all 
involve a semantically vague labial prefix v- supposedly added to known verbal 
stems (e.g. Mann 1952; Orel 1998; Hamp 2004; Holm 2011). However, a less 
dichotomous centum-satem division, with Balkan languages showing character- 
istics of both, allows for a more economical analysis of these Albanian verbs as 
regular reflexes of Core IE “thorn clusters” containing a labiovelar. Thus, Alb. 
vdjerr can simply correspond fully to Gr. g@eipa “destroy, ruin’, med. q0eípouai 
‘perish’, even down to the i-present, from Core IE *g""per- < PIE *d'g"er- ‘flow; 
melt away; disappear', and no prefix need be posited. Similarly, vdes could 
straightforwardly contain the Core IE root *g""bei- < PIE *d/gei- ‘decline; 
perish’ also seen in Gr. 991 (v)c» ‘perish’, g9Tuevoı ‘the dead’, though formally 
from a causative *g"'boi-k"-Eie- ‘leave behind’ (> ‘depart’).'? 

An important consequence of this interpretation is that, since Albanian v- or 
f- reflects the old labiovelar, the dental -d- must continue the PIE thorn element. 
This, in turn, would mean that the common view that Albanian agrees with 


1? A candidate for a reflex of the unvoiced counterpart */k"- might be Alb. ftik ‘dry’ ~ Lat. siccus 
‘dry’ (< *sicus), if from a PIE *tk"iH-ko- or *tk"ei-ko-, possibly also reflected in PGmc. 
*swiban- ‘scorch’ and/or Gr. wı-Aöc ‘bare’. None of these words have generally accepted 
etymologies. 
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Balto-Slavic, Germanic, and Italic in preserving only the dorsal part of palatal 
thorn clusters — as if *S"bom ‘earth’ and *g"hies ‘yesterday’ were *$^om and 
*s'ies, respectively — must be abandoned. Although the regular reflex of 
a palatal *g(")- in Albanian is d(h)- as well, the sole consonant left in dhe 
‘earth’ and dje ‘yesterday’ must then reflect the thorn element and not the 
dorsal, which disappears without a trace. 
The above analysis has important consequences for the internal classification 
of IE: 
1. It makes Albanian more ofa centum language, since it preserves not only the 
velar-labiovelar distinction but even the actual rounding of labiovelars. 
2. It distances Albanian from Balto-Slavic, Germanic, and Italic, which all 
agree on preserving the stop part of thorn clusters only. 
3. It connects Albanian even more to Greek than previously assumed. 


13.5.2 Conclusion 


As noted at the outset, the relationships that Albanian shows within IE are 
complicated, and the evidence discussed here should make that point abun- 
dantly clear. We have surveyed the most striking possible connections that 
Albanian shows with other branches of Indo-European, based on key pieces of 
evidence.” Technically speaking, from a genealogical standpoint, Messapic 
likely is the closest IE language to Albanian (Matzinger 2005). However, in the 
absence of sufficient evidence, that connection must remain speculative. 
Among the other connections, leaving aside the broad centum-satem param- 
eter, since we do not see it as a valid dialect division in the usual sense, we are 
left with the following, listed from the least compelling (with Italic) to the most 
compelling (with Greek): 

Albanian and Italic 

Albanian and Celtic 

Albanian and Indo-Iranian 

Albanian and Germanic 

Albanian and Balto-Slavic 

Albanian and Armenian 

Albanian and Armenian, Greek, Phrygian, and Messapic (etc.) 

Albanian and Greek 
These are not necessarily mutually exclusive, depending on one's overall concep- 
tion of the interrelationships among all branches of IE. That is, some apparent 


20 We have deliberately restricted ourselves to the best evidence, leaving out some intriguing 
shared substratum words such as Alb. dellinje, délli ‘juniper’ ~ Gr. (Hsch.) oy&Aıvos ‘wild 
cypress or juniper’, indicating a protoform *(s)g"elin-(i)o- (Danka & Witczak 1995: 132); and 
formations containing isolated roots such as *uisg'-i(i)o- > Alb. vithe ‘haunch, especially of 
a horse’ ~ Gr. ioxiov ‘hip-joint; loins, haunch’ (Mann 1952: 39). 
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shared innovations could in principle result from wave-like diffusion in prehistoric 
times. Moreover, as noted throughout, one has to ask whether limited evidence for 
a particular linkage goes beyond what any two branches might show. 

Ultimately, though, as indicated, the preponderance of evidence favours 
a close connection between Albanian and Greek,”' possibly as a subset 
within a “Palaeo-Balkanic” group with Armenian and Greek, as well as 
Phrygian, Messapic, and other fragmentarily attested languages (see 
Figure 13.1). The Albanian-Greek connection that we argue for here is 
particularly interesting in the light of the computational phylogenetic study 
of the interrelationships among IE languages reported on in Chang et al. 
2015. In that paper, starting with the same model and data set as earlier 
phylogenetic studies (especially Bouckaert et al. 2012, 2013), but with a key 
difference in that they “constrained eight ancient and medieval languages to 
be ancestral to thirty-nine modern descendants" to allow for greater accur- 
acy, the authors develop an “analysis with modern languages from all IE 
subfamilies” (Chang et al. 2015: 199—200) in which Albanian, represented 
by Arvanitika and Tosk,” ends up in their resulting tree diagram of IE 
relationships as being most closely connected to Greek. Different methods 
and different IE data sets and different assumptions can of course yield 
different results, but we take heart from the convergence of our more 
traditional qualitative assessment of Albanian's closest relative and the 
computational quantitative assessment by Chang et al. 


Arbéresh 
Graeco- 
Armenian Phrygian ,Messapic  ,Geg Arvanitika 
(Palaeo-) / £ 
Balkanic Graeco-Albanian “Illyric” Albanian Tosk Mainland Tosk 


Figure 13.1 The position of Albanian 


2 


In line with our interest in just presenting the best evidence, we have focused on shared 
innovations. However, shared retentions can in principle, if unusual enough compared to the 
rest of the family, and especially when paired with significant shared innovations, point to 
a close genealogical connection; in a certain sense, retaining something can, under appropriate 
circumstances, be innovative in itself. See also footnote 5. 

Arvanitika, of course, is a Tosk dialect, but we assume that by “Tosk”, Chang et al. mean the 
standard language, which is based on a Tosk variety. 

Ringe, Warnow, and Taylor (2002), for instance, as noted in Section 13.4.5, see Albanian and 
Germanic as particularly closely related. 

In the absence of linguistic data about ancient Illyrian, we feel caution is in order about the 
connection between Illyrian, whatever that label might have meant to the ancients, and 
Albanian, even if that connection might be reasonable from a geographic and archaeological 
perspective (so Katičić 1976). 


22 
23 


24 
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Martin Joachim Kümmel 


14.1 Introduction 


Indo-Iranian is mainly divided into the two big sub-branches of Indo-Aryan and 
Iranian.' IIrn. languages are first attested in the fifteenth century BCE in the 
Hurrian state of Mit(t)an(n)i and surrounding areas through divine, throne and 
personal names as well as through hippological terms. Linguistically and 
culturally, this variety seems to belong rather to Indo-Aryan = WIA (cf. 
Mayrhofer 1982; Lipp 2009, 1: 265-73, 310-17). Otherwise, Indo-Aryan is 
confined to south-eastern Afghanistan and the Indian subcontinent — (E)IA, 
with the language of its oldest texts, i.e. the Rigveda, being slightly less archaic 
than WIA. To explain this distribution, we can assume that IA was originally 
a southern branch whose speakers then migrated both westwards and east- 
wards, possibly under pressure from Iranian coming from the north. Iranian 
itself was very widespread from the Pontic steppe towards Central Asia and 
Mesopotamia. Its oldest texts are roughly contemporaneous with Vedic IA. Due 
to this wide geographical distribution, Iranian is more diverse (the validity of 
the sub-branch was even doubted by Tremblay 2005). 


14.2 Evidence for the Indo-Iranian Branch 


Of course, innovations are more interesting than archaisms. There are some 
important laws and changes that are characteristic for Indo-Iranian and which 
could be innovations. 


14.2.1 Bartholomae's Law 


Bartholomae's Law, i.e. the rule on progressive rather than regressive assimi- 
lation in obstruent clusters starting with a “media aspirata" stop, is an active 
rule in Sanskrit, and its results are still faithfully reflected in Old Avestan 
morphophonology, although the distinction between media and aspirata has 


! Conventions of transcription: (P)IE *h, *y, *#; *k, *g; *q, *c = traditional *h,, *A», *h3; *k, ko 
*K og. 
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already been lost. Accordingly, it must be reconstructed for Proto-Indo-Iranian 
and even Proto-Iranian.” Since there are hardly any traces of this law outside of 
Indo-Iranian, it is disputed whether it can be a PIE law or an IIrn. innovation. 
However, the rule was abandoned completely as early as Younger Avestan 
(only isolated examples survived in later Iranian), and so the lack of evidence in 
IE languages attested later than that is hardly significant, since it is highly likely 
that the rule was lost independently. Even its absence from Anatolian and 
Greek may reflect rule loss, since devoicing of the aspirates in the latter 
would have obscured the rule, and in the former, media and aspirata merged 
in the same way as in Iranian and, probably, voicing was lost altogether. 
Furthermore, the rule is even easier to motivate in a stage of PIE that had not 
yet developed aspiration (cf. Miller 1977a; 1977b) and in which the “mediae” 
did not participate in the voicing (or fortis-lenis) contrast. Thus, it is quite 
possible that BL is an archaism, but its loss elsewhere is trivial enough not to 
require a common innovation of the other branches. 


14.2.2 Grassmann 5 Law 


Due to the general loss of breathy voice in Iranian and Nuristanic, it is difficult 
to say whether Grassmann's Law (GL), i.e. the dissimilation of an aspirated 
stop preceding another aspirated stop, occurred already in Proto-Indo-Iranian 
or only later in Indo-Aryan. The latter assumption would imply a rather long 
period without dissimilation, which seems quite possible considering the 
parallel development in Greek, where it clearly happened only after the (rather 
late) devoicing of aspirates. 

Scharfe (1996) has argued for dialectal differences in the chronology of the 
application of GL and the Vedic devoicing of sibilant clusters, which would 
necessarily imply a late date for GL. However, this is based on very little 
evidence and does not explain the whole distribution (see Kobayashi 2004: 
106-7, 114-16, 122-7; Lipp 2009, 1: 252-7), so it remains much more 
probable that GL preceded the devoicing everywhere and thus could be of 
PIIrn. date. 

There is a small circumstantial argument for an early date: the 2sg. impera- 
tive of PlIrn. *g'an/g'n- ‘to hit, kill’ starts with a palatal in both Vedic jahi and 
Avestan ja/ói, while the parallel imperative of *gam-/gm- ‘to come’ is Vedic 
gahi = Av. ga'di with the expected velar. The palatal in the former might have 
been taken over from the strong stem to avoid homonymy of these forms. If this 
had happened already in PIIrn., it would presuppose that the two forms *g^ad^í 
‘hit!’ and *gad‘i ‘come!’ had already become homonymous by GL, so that 


? Tremblay (2005) even took this as an argument that the original Old Avestan language still had 
two distinctive voiced stop series, i.e. preserved voiced aspirates. 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


248 Martin Joachim Kümmel 


*ead/í was replaced by *gad"i to solve this problem. However, a parallel 
development is not completely excluded: a partial spread of the palatal can 
also be observed in other zero-grade forms of *g’an-, too, cf. Ved. prs.2pl. 
hatha, OAV. infinitive ja'diiai. 

Furthermore, there is evidence in Tocharian that it also underwent the 
same kind of dissimilation (but see the more cautious assessment in Section 
6.5.2 n. 10): while *d* normally became t (> c when palatalized) and thus 
merged with original *r, it sometimes shows the result ts (palatalized), 
merging with *d, and such cases only appear if a second aspirate follows, 
e.g. Toch.B gerundive tsikale < ‘should be made’ < PToch. *tsik-a- « PIE 
*d'ieh-, to *d'eig^- ‘form’. For the other stops, the eventual complete merger 
of all series makes it impossible to see if there was a similar dissimilation. 

As a sporadic or narrowly conditioned change, aspiration dissimilation 1s 
also found in Latin (see Weiss 2018 and Section 8.2 n. 11) and Armenian (only 
before a nasal cluster? Cf. Rasmussen 1989: 170-1 n. 16; Martirosyan 2010: 
726). In later Indo-Aryan, similar dissimilations also happened again, when 
new sequences of breathy voiced stops had arisen. 


14.2.3 Brugmann’s Law 


Brugmann (1876) postulated a change of “*a>”, i.e., *o > (*o >) *a in open 
syllables before a consonant. This proposal did not gain much support subse- 
quently, and Brugmann himself withdrew it. However, the reconstruction of 
laryngeals led to its resurrection, since it could explain many apparent excep- 
tions as conditioned by a lost laryngeal (see Kurylowicz 1927: 206—7; 
Lubotsky 2018: 1877). Pace Kiparsky (2010: 82.3), the data are still easier to 
explain by applying a real sound law than by invoking a special grammatically 
conditioned development of "floating" *o. The counterexamples given by 
Kiparsky are either invalid (because they can have original *e or a cluster 
*CH) or can be explained by inner-paradigmatic analogy (as *pári-, *dwi-), 
while *a in the first dual and plural of the thematic inflection is not explained by 
Kiparsky's account.” 

A similar change can be observed in Anatolian: accented *ó was apparently 
lengthened > *6 > Hitt. Luw. d (vs. *á > d), even in closed syllables, cf. Hitt. 
kanki ‘hangs’ < *könkej, Luwian häs ‘bone’ < *yóst.^ Unfortunately, it remains 


w 


Pyysalo (2013: 114-25) rejects the law in its original form but assumes a “corrected” version, 
“Brugmann’s Law II", where lengthening is only found if *o was followed by a lost “glottal 
fricative” *h (= *h2), while he rejects all other compensatory lengthenings caused by laryngeals. 
This leads to unnecessary postulation of a glottal fricative for all cases of Indo-Iranian à = European 
*o, and his reconstruction methodology is very problematic in general. 

Chronology and details are disputed, see Kloekhorst 2008a; 2008b: 98-9; 2014: 250, 553-9; 
583-4 vs. Melchert 1994: 105, 131, 243-4, 264; 2012b. 


ES 
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unclear if this was an early change and if it happened in all Anatolian languages 
(in Lycian *e and *o clearly merged into e, but the quantity distinction was lost 
there). 

The mechanism of this sound change is not really clear: could it have been 
a lengthening of “tense” [o] vs. “lax” [e] (Keydana 2012)? Or is it rather 
a kind of relic of an originally long vowel (Kümmel 2012: 308-20), similar to 
what Brugmann proposed (cf. also Viredaz 1983: 35-7; Woodhouse 2012: 
2 n. 1; 2015: 6-9)? This last option would presuppose a common innovation 
of most other languages, i.e. shortening of *6 in most environments (preced- 
ing *oH > *0); however, this is difficult to reconcile with preserved IE *o in at 
least forms with lengthened grade. 


14.2.4 The Vowel Merger 


The most striking feature of Indo-Iranian is the merger of all non-high 
vowels instead of partial mergers in the neighbouring languages; elsewhere 
this is only found in Luwic (at least in Luwian). It is probable that this 
merger happened in two stages: first a lowering with a merger of non-front 
*o = *q > (back) *a, then a merger of front *æ = *a > (central) *a. The 
intermediate stage with *æœ : *a might be reflected by some Uralic loan- 
words, but this is not certain. 

The more restricted merger of *a and *o is much more widespread: it 1s 
attested both in Anatolian (except Lycian but cf. above) and a “north-eastern 
European" area from Albanian and Messapic to Balto-Slavic and Germanic. In 
fact, only Tocharian and the southern languages from Celtic to Armenian show 
a distinction of these vowels. Thus the first step of the Indo-Iranian merger 
might be part of a larger areal development. For long *a and *o the merger is 
restricted to Anatolian, Germanic and Slavic, and in non-final syllables it is also 
found in Celtic. Albanian merges *e and *a, probably together with Messapic 
and Phrygian. 


14.2.5 The Liquid Merger 


The apparently complete merger of PIE */— *r > *r is not found anywhere else 
in IE. Substrate influence is therefore quite probable, but no known language in 
the relevant regions shows this phenomenon. Note that the often assumed 
“retention” of / in some cases in IIrn. languages is probably a mirage with no 
historical foundation (see Hock 1991: 138; Mayrhofer 2004); there is no 
attested variety in which / shows a statistically valid correlation with PIE */. 
Preservation is also contradicted by the fact that the liquid merger fed the ruki 
development, i.e., PIE */s turned into *rs > *rs in all of Indo-Iranian, cf. the 
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root *&"els-/*kols-/*k"Is- (Gr. tédcov ‘furrow’) > PlIrn. *Kars-/*kars-/*krs- ‘to 
pull, draw, plough’ > Ved. cars-/kars-/krs- = Iranian *kars-/ *kors-.” 


14.2.6 Weak Stem in Accusative Plural 


While in Indo-Iranian the accusative plural belongs to the “weak” stem, 
elsewhere it normally belongs to the "strong" stem. The only exceptions 
are “proterokinetic” i/u-stems with *-ej-es : *-i-ms; *-ew-es : *-u-ms. 
The simplest explanation for this difference is an Indo-Iranian innov- 
ation, used to repair the homophony of accusative *-ms > -as = nomina- 
tive *-es > -as, building on the existing model of the i/u-stems (see Hock 
1974). 


14.2.7 Laryngeal Aspiration 


Indo-Iranian is the only branch with incontestable examples of aspiration 
caused by a following laryngeal. The most famous examples show */ after 
stops: 

e *megy- > *majh- > *maj'h- ‘big’? > Ved. mäh-, cf. Gr. uéya, Hitt. mekk- 
(together with Iranian *majh- > *mach- > *mac- > mas-, ma0-, see 
Section 5.3) 

e *sistya- > *sistha- > Ved. tistha- ‘to stand’, cf. Gr. sta-, -sth- (in cases like 
Opec0-) 

e *pltyü- > *prthu- ‘broad’ > Ved. prthü- = Av. par’Ou-, cf. Gr. IIMaraıai 

* pf2sg. *-tya > Ved. -tha = Av. -0a, cf. Gr. -tha, cf. also Vedic mid.2sg. -thas. 

For *h, this is controversial, but there are some potential examples: 

* *pónt-eloh- ~ *pnt-h- > *pántà- ~ *path- > Av. pantā- ~ paĝ- (see de Decker 
2012) 

* 2pl. *-the > Ved. -tha = Av. -0a (but cf. Sabellic *-tà < *-tah; if not from the 
dual?). 

Continuation of */; as an aspirating sound would also be supported by *d'ed'h-> 

*dadh- > *dath- > Proto-Iranian *da0- (see above), but this example does not 

show secondary aspiration as such. 


5 There have been attempts to include pre-PlIIrn. */ as [+high] (which would require a change to 
a palatal or at least retroflex) in the sounds that triggered ruki (see Lipp 2009, 1: 33 n. 72) but this 
is contradicted by its different behaviour in all other ruki languages. There is no other evidence 
that IE or pre-PlIIrn. */ had an articulation place different from *n. Fortunatov 1881 claimed 
a special development of */t > Skt. t (etc.), but this is generally rejected today; Pyysalo (2013: 
227-43) has tried to modify Fortunatov’s Law by also including *r but assuming an adjacent 
“diphonemic pair" *ah/ha = laryngeal as additional conditioning. This cannot be accepted since 
it is phonetically unmotivated, and the general approach is based on a flawed reconstruction 
methodology and much dubious material. 
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In Greek only *7yV > T'V seems to be possible, but this is disputed (cf. 
Cowgill 1965 vs. Peters 1991), and other branches show no clear evidence. 
Armenian and Slavic seem to show x « *ky, cf. *tkáyky-/ (Hhykay- ‘branch’ > 
Arm. c ‘ax, CSI. *soxá (c) = Ved. sakhä-, Sogd. sax (beside MPers. sag), but this 
does not necessarily presuppose an intermediate stage with aspiration. No other 
evidence is found in languages without phonological aspiration. 

Notably, it is not altogether clear if Iranian participated in the development of 
aspiration, or if clusters of stops + */ just underwent preconsonantal fricativi- 
zation of stops followed by loss of *h (see Kümmel 2018c: 162-4). 


14.2.8 A Striking Difference 


There is one striking difference between Irn. and the rest of Nuclear IE 
(= Indo-Tocharian, see Olander 2019): “vocalization” of laryngeals leads 
to low(er) vowels everywhere from Tocharian to Celtic, and from Greek to 
Germanic, but in Indo-Iranian, we only find the high vowel i, and Iranian 
and Indo-Aryan do not agree in the conditioning, with Iranian most often 
showing no vowel. The simplest explanation for this situation is that 
epenthesis was partly post-PlIrn. (see Kümmel 2016c; Aufderheide & 
Keydana 2016), and that i is not a direct reflex of the laryngeal. It can 
thus rather be compared to Greek cases of “schwa secundum” = i insertion 
(de Vaan 2009). This rather strong difference might be interpreted as an 
early divergence of Indo-Iranian vs. the rest. However, differences in 
details exist between all other branches, too, so it remains unclear how 
fundamental this is. 


14.3 The Internal Structure of Indo-Iranian 


In the oldest stage, there are no fundamental or significant grammatical differ- 
ences between Iranian and Indo-Aryan. The morphology and syntax of the 
earliest Vedic and Old Avestan texts are very close, and the main differences are 
found in phonology and lexicon. 


14.3.1 Phonological Features 


An overview of the main phonological differences is shown in Table 14.1 (clear 
innovations are shaded). 


© There are no really good examples of “vocalization” in Anatolian: weak stems like as-, ad- for 
*h s-, *h,d- are possibly analogical, and Luw. tuwatr-, Lyc. kbatra ‘daughter’ is not clear 
enough. 
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Table 14.1 Main phonological differences between Iranian and Indic 


Proto-Indo-Iranian Iranian Indic Remarks 

*b, *d, *g LE Kg o b,d,g b,d,g : b^, d^ g^ merger 

*p, *t, *k/ C OR p.t, k fricativization 

*ph, *th, *kh f 0.x ph, th, kh (only a special case of the 
, previous row) 

"Er *ts, *dz > s/0, 2/6 Sy j depalatalization 

WAVES *g *dz: *j flO E o merger 

*s h s 

*y Š $ only phonetic 

*D, *z[) zd, zd wal, ael not yet in WIA 

*tst, *dzd^ st, zd tt, dd’ different simplification 

"S D*EY ro > Š: xr ks : ks dissimilation, merger 

*p Far) *r only phonetic? 

*ar ar (- ar?) irlür see Cantera 2001 

*h- hix ~ Ø Ø Kümmel 2016a: 83; 


*Dh, *Bh, *Jh *Dahilu 


*-CHC- 


*pt- 


*pst-, *db-, *dm- 
*-ks(t), -kSt- 


*0, *f. *ts *Üailu 


CC 


fi 


fst, db, dm 
xš(t), xst 


d^, b^, h *dailu 


Cie 


pit 


st, b, m 
k, kt 


2018c: 166 

Kümmel 2016a: 82-3; 
2018c: 165-6 

see Werba 
2005; Kümmel 2016c: 
219-22 

epenthesis (Kümmel 201 6c: 
222-3) 

see Kümmel 2014: 211-12 

simplification, see Kümmel 
2014: 212-14 


For Proto- or Common Iranian affricates see Lipp 2009, 1; 183-91; Peyrot 
2018; for the development of “thorn” clusters (*tk > *té > *ts etc.) see Lipp 
2009, 2: 1-313 with refs. 


14.3.2 Morphosyntactic Features: Iranian vs. Vedic 


There are a number of mostly minor differences in morphological detail 
between Old Iranian and Vedic Sanskrit. Most often, Indo-Aryan has innovated 
while the older stage is better preserved in Iranian (Table 14.2). 

However, there are also some cases where Old Avestan stands against an 
innovation in Younger Avestan, Old Persian and Vedic, so it seems that there 
was a parallel development in Indo-Aryan and younger Iranian (Table 14.3). 

Only rarely is it Old Avestan that innovates vs. archaisms in Younger 
Avestan and (if applicable) Vedic (Table 14.4). 
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Table 14.2 Morphological differences between Iranian and Indic 


Iranian Indic Remarks 

gen. : loc. dual *-üs : *-aw *-qws cf. Slavic *-u < *-au(s) 
instr.-dat.-abl. dual *-aybhyà > OAv. *-ab*ya(m) > 

-Oibiid, YAv. -abhyam 

-aebiia, 

OPers. -aibiya 
u-stem type -aw- (nom. -dus, -am -úş, -úm 

sg., acc.sg.) 

n. n-stem gen.sg. *-ans > OAV. -2ng -nas 
a-stem instr.sg. ui -éna (-à) 


comparative *-yás-, 
perf.ptc. *-wäs- 
nt-ptc. to thematic stems 


Isg. pronoun gen. 
2pl. pronoun nom. 


3ps. encl. dative 
possessives 


distal demonstrative 


interrogative 
numeral ‘one’ 
middle thematic ptc. 
active optative 

3pl. SE 
subj.mid.1sg. 


mid.3pl. 


*_yah, *-wäh- 


*_ant- 


*mana 
*yüz-am 


*hai ~ *Sai 
av. ma-, Oßa- 


*awá- acc.sg. 
*aw-am 

ci-, ca- : ka- 

*aywá- 

-mna- 

-T- :-yd- 

-at (: -rS) 

*-anai (~ *-di) 

*-arai, *-àra(m) ~ 
*-rai, *-ra(m) 


-yams-, -vams- 


-at- ablaut taken over 
from athematic 
bases 

máma but cf. Khot. mamä 

yüyám contamination with 
Ipl. vayám 

- loss in Indic 

- loss in Indic (but also 
in later Iranian) 

amú- acc. see Klein 1977 

sg. *am-u 

ka-(kim) generalization of k- 

*ayka- 

-màna- but cf. MIA -mina- 

only -yã- (few relics of *°aH-i-) 

only -ur 

only *-ài 

only *-rai 


14.3.3 The Special Case of Nuristanic 


The so-called Nuristani languages are spoken just between Eastern 
Iranian and NW Indo-Aryan in the Hindukush region. They are only 
attested in modern times and represent a group of transitional languages 
between Indo-Aryan and Iranian, rather difficult to classify due to the 
lack of ancient data. In some features, they agree with Iranian, in others 
with Indo-Aryan, but they clearly differ from both since early times: 
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Table 14.3 Morphological archaisms in Old Avestan 


Old Avestan Elsewhere Remarks 
accusative 1/2pl. na < *näs *nas, *was (= dative- cf. Lat. nos, uos, OCS ny, vy 
vd < *was genitive) 
nom.acc.pl.n. r/ -àr? YAv. -qn = Ved. -än-i cf. Hitt. -ar 
n-stems 
lsg. present -à ~ -àmi only -ami cf. general European *-o 
velar ~ palatal aögö YAw.aojo = Ved.ójas generalized velar in Ved. 
alternation dgas-, okas-, elsewhere 
palatal 
inflection of *wicwa- — OAv. vispayhd Ved. víśve, anyé YAv. pronominal desinences of 
‘every, all’ *anya- “Median” vispe, aniie OPers. adjectives (archaism in 
‘other’ aniyaha aniyai OXAv. not sure) 


Table 14.4 Morphological innovations in Old Avestan 


YAv. (7 Ved.) Old Avestan Remarks 
gen.sg. *krátwas, xra0Bo = krátvas xrataus most productive 
*pacwás, *pitvás pasuuö = pasvás pasaus inflectional type 
*piOpo = pitvás pitaus 
acc.pl. *prtwás paraOpo paratus 
weak stem *majh-, mas-, da- maz-, dad- analogy after strong 
*dadh- > *mac-, *da0- = mah-, dadh- stem mazä-, dadà- 


e “Iranian” features: depalatalized *ts, dz distinct from *¢, j, no aspirates 
(7 deaspiration)/ — rather trivial developments (also attested in neigh- 
bouring Indo-Aryan but much later) 

e “Indic” features: *tst, dzd^ > tt, dd; *ar > *ilur; preserved s, no 
fricativization 

* special features: 

e *s/tc > *ts > *ts vs. Iranian S, Indic ks, cf. Kati ic ‘bear’ 


? However, since Dameli (in spite of some doubts) probably belongs to Nuristanic and appears to 
show voiceless aspirates in line with Indo-Aryan, the loss of voiceless aspirates in the rest of 
Nuristanic may be a late innovation. For voiced aspirates, the merger of the palatal aspirates with 
the simple voiced palatals presupposes a chronology different from Indo-Aryan, but this only 
requires that aspiration was lost before the debuccalization of palatal aspirates. 

8 With one probable exception: *warnd- ‘wool’ did not become *wurnd- (> Ved. ürnä-) but 
*warnd- > *würd-; cf. Av. var’nä-. 
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Table 14.5 Phonological changes in Iranian, Nuristanic and Indic 


RER oj *di rd Ha "spo Cy Hr Cp CS *st *rn *nt 
Irn. ts > s/0, d d a st h 60r © 5 st mC nt>nd 
dz > zld rr/nn) 
Nur ts, dz aa d ilu tt s tr if ts Ba lA nt (> t) 
Ind. Sblo d! d ilu tt s ir (P ks st rn nt> 
(> nn) NW 
nd 
Iranian 
Nuristanic 
Indo-Iranian Indic 


Figure 14.1 The Indo-Iranian languages 


e *st > st (Kati dust ‘hand’); *s/s > s (secondary, see Cathcart 2011); Vrn 
(> *rr2) > Vr 
* no voicing in nt, nk, nc (vs. most neighbours). 

See Table 14.5 (innovations shaded). 

The most recent discussion is by Werba 2016, who argued that Nuristanic 
forms a subgroup with Indo-Aryan; but even if he was right to stress that 
similarities to Iranian do not require a common stage, the differences from 
Indo-Aryan are strong enough that for all practical purposes, Nuristanic has to 
be treated as an independent third branch (see Figure 14.1). It did not partici- 
pate in most early innovations of either Iranian or Indo-Aryan. 

In the lexicon, Nuristanic shows some possibly ancient similarities to Iranian 
(e.g., *khanda- ‘to laugh’, *waina- ‘to see’, *arjana- ‘millet’, *pragama(ka)- 
‘young animal’, *//ayan- ‘winter’, *tridaca ‘13’, *katrudaca ‘14’), but much 
more often it agrees with Indo-Aryan, which, however, could be due to secondary 
influence in most cases. It does not share most typical early Iranian (potential) 
innovations like *g“ausa- ‘ear’, *Katsman- ‘eye’, *wasuni- ‘blood’, *atr- ‘fire’, 
*swar- ‘to eat’. 


14.3.4 Lexical Differences 


Some examples of lexical differences between the main branches are shown in 
Table 14.6 (dating of innovations is of course uncertain in Nuristanic due to the 
lack of ancient data). 
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Table 14.6 Examples of lexical differences between Iranian, Nuristanic and Indic 


Iranian Nuristanic Indic Remarks 
(Avestan) 
‘fire’ ätar- agni- choice of inherited terms; replaced 
(-ayni-) by angära- ‘glowing coal’ in 
*angara- Nuristanic and *Dardic" IA 
‘water’ — — varludan- derivatives in Im. Nur. 
dp- *dp- äp- 
‘rain’ *warsa- varsá- 
vára- derivative of *waHr ‘water’ 
‘eye’ (asi) *aksi aksi parallel innovations 
casman- cáksus- 
ear’ FUST *karna- > kárna- cf. Av. kar’na- = Ved. karná- 
*kara- “deaf” 
*gausa- Ved. ghösa- ‘sound’ 
“to eat’ xar- *yaw- ad- *yaw- also in Waxi and 
Chitral IA 
*to drink xar- *pd- pa- relics in easternmost Iranian: 
Waxi pav- < *piba- 
“to see’ vaena- *waina- Ved. véna- ‘to look after’? 
pásya- Av. spasiia- ‘to watch’ 
‘blood’ vohuni- *asan- ásrk, asan- 
‘bird’ vi- ? vi- 
maraya- *mrga- paksín- Ved. mrgá- *animal, game, deer 
‘spring’ *wasar- 
*wasanta vasantá- 
‘winter’ zim- 
zaiian- *jayan- 
hemantá- 
‘ice *yaja- ? 
aexa- 
‘snow’ snaiga- * snih-, *sneha- 
*jim- jima- himá- 
vafra- 
*moon' mäh- mäs- mäs- 
candramäs- 
‘sky’ (diiu-) dyav- 
asmän- 
abra- 
‘stone’ asan-, ásman-, ásn- 
asanga- 
*warta- *warta- *warta- 
*eari- *giri- 
‘mountain’ gari- giri- 
pa'ruuata- párvata- 
*kaufah- 
*dara- *dhara-? 


? Derived from the noun *waind- > Ved. vend- ‘watcher’, MPers. wénag ‘guard, watchman’; Indic 
preserves the narrower meaning; the broadened meaning may also be reflected by an apparently 
old loanword into Western Uralic, cf. *wajna- > Southern Saamic *wuojne- ‘to see’, Mordvin 
vano-/vana- ‘to watch’ (see Holopainen 2019: 312-13). 
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For differences in most of the agricultural terminology (as opposed to animal 
husbandry), see Kümmel 2017. 


14.4 The Relationship of Indo-Iranian to the Other Branches 


14.4.1 The Central IE Sound Shift 


Indo-Iranian seems to belong to the group of IE languages that reflect voiced 
aspirates and thus presuppose the “central IE sound shift" (Kümmel 2012: 304— 
6; 2016c: 130-2), i.e. a chain shift from PIE (PIA) *d : d> Central IE *d' : d. 
This is clear for Indo-Aryan, which has had breathy voiced stops ever since 
Sanskrit. However, it has been proposed that this change did not happen in 
Iranian (and Nuristanic) where aspiration of media aspirata (MA) is not directly 
preserved (Lubotsky 2018), so the sound shift would only be an Indo-Aryan 
innovation, parallel to Greek etc. This is not very easy to determine. One 
possible argument for pre-Iranian aspiration might be Bartholomae’s Law, 
the outcome of which is still faithfully observed in Old Avestan. However, 
this law 1s possibly older still, since it works even better with pre-shift phon- 
ology (cf. progressive voicing as in Turkish) if implosives did not participate in 
the voicing distinction (cf. above). Thus, its reflection in Old Avestan does not 
necessarily presuppose aspiration but only some distinction between “media” 
and “media aspirata”. At first sight, Iranian *dugdar- ‘daughter’ < *dugd‘ar- 
appears to presuppose a post-PlIrn. application of BL, since *dug^tar- can only 
have arisen secondarily by loss of the laryngal in *dughtar- < *d'uggter-. 
However, such an allomorph might already have been present in PlIrn. and 
simply been ousted in Indic (see Lipp 2009, 2: 370-84; Kümmel 2018c: 169). 
Within a “glottalic” reconstruction of PlIrn., one could also assume *dugHtar- 
[’g?] > *dug(H)tar- [g?] > *dugdar- so that we would not strictly need aspir- 
ation to be present. However, there is at least one change in Iranian that seems 
to presuppose aspiration of the MA, namely the transfer of postnasal aspiration 
to the preceding onset seen in *teng’- > *tang^- > Iranian *thang- > *Oang- ‘to 
pull’ and maybe also in *kumb‘a- > *khumba- > Iranian *khumba- > *xumba- 
‘pot’. This might be supported by a systemic argument: Indo-Iranian does not 
show any bias against “mediae” after nasals, as one might expect for implo- 
sives, so it seems more probable that the “mediae” had already become voiced 
explosives. 


14.4.2 The Satem Phenomenon 


The so-called satem languages show palatal or depalatalized coronal affricates 
or fricatives corresponding to centum velar stops, and simple velars corres- 
ponding to centum labialized velars (labiovelars). In a third type of 
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correspondence, all languages have simple velars. The usual PIE reconstruction 
is so-called “palatals” in the first case, “labiovelars” in the second and “pure 
velars” in the third. However, the existence of real “pure velars" in PIE has 
been questioned, and this type of correspondence could also be explained by 
neutralization of an original twofold contrast between “palatovelars” and 
“labiovelars”. 

The satem languages comprise all Eastern languages except Tocharian, 
while the areal distribution of centum languages looks much less compact, 
including the outliers Anatolian and Tocharian, and the European West and 
South. Therefore, the centum situation is most probably original, and the satem 
group underwent a chain shift *k” : *k > *k : *c. This is a rather trivial phonetic 
change, but details of phonologization and distribution are far from trivial, cf. 
forms like *(H)októH ‘eight’, synchronically isolated. This requires the 
assumption of one areal change, possibly cutting across other isoglosses. 

The satemization is apparently connected to another areal feature, that of the 
ruki rule, i.e. a retraction of *s after non-anterior sounds, which is found in 
more or less the same branches, though to different degrees (with some 
restrictions in Slavic and Baltic, and only to a very limited extent in 
Armenian and Albanian, see Martirosyan 2010: 709-10 with refs.). This 
allophony may have been more widespread in IE but was only phonologized 
in satem languages since only these developed additional sibilants from other 
sources (see Andersen 1968). 

Similar developments of “palatals” are found in Luwic Anatolian, but then 
combined with preserved labiovelars. According to the most recent investigation 
(Melchert 20122), there was a conditioned palatalization of old “palatals” only; 
but the claim that original “pure velars" contrastively remained unpalatalized is 
unsubstantiated: the only example of a preserved velar before a front vowel is 
Luwian Kisd(i)- ‘to comb’, and this may have analogical k- or even continue *ks- 
(there was a regular change of *ks > kis in Hittite, no counterexamples in 
Luwian). So Luwian might in fact reflect the usual “centum” merger of “palatals” 
and “velars”, followed by a conditioned palatalization of the resulting velars. 
However, some words appear to show Luwic “palatals” in environments where 
secondary palatalization would be improbable: cases like Luw. zanta *down' 
(Goedegebuure 2010) < *kant- (cf. Hitt. katta, Gr. katá) and also HLuw. azu(wa)- 
‘horse’, zuwan- ‘dog’ < IE *ekw(o)-, *kwon-, if the latter are not to be read as 
asu(wa)-, suwan-, borrowed from WIA (as argued by Szemerényi 1976; Lipp 
2009, 1: 273—302). If these words show a genuine Luwic development, this looks 
much more like preserved IE “palatals” than anything secondary. ^ Interestingly, 


10 One might consider an intermediate stage with a secondary front vowel in the first case, so 
something like pre-Luwic *kmt° > *kent? > *kant? > zant®, but this does not work for the two 
words with *kw. 
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recent research has also found some ruki-like developments in Luwian (Rieken 
2010), which would support the idea that the Luwic developments are satem-like. 
Currently, it is still unclear how exactly this might be explained. 


14.4.3 Middle Primary Endings 


The “primary” endings of the middle are marked by *-y, identical to *-i used in 
the corresponding endings of the active. Here IIrn. agrees with Armenian, 
Albanian, Greek and Germanic, while the more “peripheral” branches 
Anatolian, Tocharian, Italic and Celtic show *-r. The latter has been interpreted 
as an archaism and marking by *-i/y as analogical (see Dunkel 2014: 669-70). 
However, much is still unclear here. In Phrygian, we find -toy earlier than -tor 
(but never -to). In Tocharian, the preterit middle 1sg. *-ai, 2sg. *-tai could be 
explained as relics of older -i-endings (see Malzahn 2010: 44-6 with refs.). In 
Celtic and Italic, -r is not used in all cases, which might point to an incomplete 
spread. 

In Greek, the | pl. and 2pl. endings are not marked by -i (mirroring the situation 
in the active), but in Indo-Iranian, they also have a final diphthong 
*-qy, resulting from a further spread, viz. 1 pl. *-mad“ay < *-med"oj for *-mesd'x. 
The same probably happened in Armenian, Albanian and Germanic (see 
Kümmel 2018b: 194). 


14.4.4 Verbal Dual Endings 


The non-present endings Ved. 2du. -tam, 3du. -tam seem to agree perfectly with 
Gr. 2du. -ton, 3du. -tàn « *-tom, *-tám. However, the corresponding Avestan 
endings -fam and -tgm are both used for the 3du., and Toch.B 3du. -te-m (with 
a secondary nasal) might support the use of *-tom for the 3sg. Similarly, 
Avestan does not reflect the distinction of Ved. 2du. -thas : 3du. -tas but used 
-0o = -to indiscriminately. Gothic 2du. -ts seems to agree, but Greek uses 
a different ending with no distinction 2=3du. -ton. The Baltic 2du. *-tas and 
Slavic 2du. -ta, -te, 3du. -te do not agree completely, so a precise reconstruction 
remains difficult (Pooth 2011 has argued for a secondary differentiation and 
a connection to the middle). 


14.4.5 Formation of Accented Personal Pronouns 


The PlIIrn. stems of the accented non-singular personal pronouns are Ipl. *as- 
má-, 2pl. *us-má- « *ns-mé-, *us-mé- vs. 1du. *awá- « *aH-wá- « *nH-wé- 
(2du. *yuwá- = *uH-wa-). This agrees most closely with Greek Ipl. *ahme, 
2pl. *uhme > Aeol. uue, Öuus; Dor. àue-, oue-; Ion.-Att. rjue-, üue- and 1du. 
*nö-we < *noH-we (but 2du. eoo). Elsewhere we either find only *nös, *wós 
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(Italic, Balto-Slavic, Albanian) or 2pl. *uswe: Celtic 2pl. *swis; Germanic 
*izwiz or even Ipl. *nswe > Hitt. anze-, sume-, Luw. anzu-, unzu-. The PIE 
situation is not very clear: apparently extension of the base by both *-me and 
*-we was possible, and various scenarios have been proposed: 

a. pl. *-me vs. du. *-we (Cowgill 1965 = IIrn. + Gr. Archaism) 

b. Ist *-me vs. 2nd/3rd *-we (Katz 1998: 279) 

c "inclusive" *-me vs. “exclusive” *-we (Dunkel 2014: 494, 499, 569—74).!! 
An original inclusive/exclusive distinction appears most promising, but 
typologically, an inclusive first person (in the usual definition *me and 
you’) often shows a marker of the second person, and this might favour 
a distribution of first person exclusive *-me (cf. 1sg. *me-) *me and someone 
else but not you’ vs. first person inclusive ‘me and you’ + second person *-we 
(cf. second person *wo-). In this case, Greek and IIrn. would show a common 
innovation, 1.e. generalization of the exclusive marker *-me in the first person 
plural followed by its spread to the second person plural, and generalization 
of the inclusive marker in the first dual. However, this innovation need not be 
exclusively Greek and IIrn., since corresponding forms might have been lost 
in all branches that lost these extended forms, i.e. Italic, Albanian, Balto- 
Slavic and Tocharian. 


14.4.6 Augment 


The so-called augment, i.e. a verbal prefix marking the past vs. the injunctive is 
only found in Indo-Iranian, Greek, Armenian, Phrygian and Albanian and 
might be either an archaism lost elsewhere or a common innovation. 
However, it seems clear that much of the development was parallel rather 
than shared, since in the earliest records, the prefix had not yet become an 
obligatory marker. Therefore, the original situation must have been a much less 
grammaticalized item, in which case it is much easier to assume its loss in other 
branches. 


14.4.7 Primary Superlatives 


The primary superlative is derived from the primary comparative by the suffix 
*-t(H)o- in Indo-Iranian, Greek and Germanic, while Italic and Celtic show *-is 
-m(H)o-. Since both suffixes correspond to some original numerals (see Lujan 
2019), a parallel development is not unlikely. 


!! Dunkel’s reconstruction is based on the particles *me ‘within, together with’ and *we ‘or’, and 
he uses an unusual definition of inclusive = ‘me and a third party’ vs. exclusive = ‘me without 
a third party’. 
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14.4.8 Secondary Comparatives 


The suffix *-tero- serves as a productive secondary comparative only in Hrn. 
and Greek, while elsewhere it can only be derived from pronouns and adverbs. 
However, the corresponding superlative formation is different: Greek -tato- vs. 
PIIrn. *-tama-. Therefore, the development was not identical, so the probability 
of a parallel extension of the existing departicular system is quite high. 


14.4.9 Formation of Decades 


The Pllrn. cardinal numerals ‘thirty’, ‘forty’ and ‘fifty’ are formed by 
a suffixoid *-(d)ca(n)t-, based on compounds with *_dkomt-/dkmt-. This 
seems to agree only with Celtic, where all decades from thirty to ninety are 
formed with *-dkomt-. By contrast, Armenian, Greek, Italic and Tocharian 
show a slightly different formation with cardinal + collective *dkomtyldkmty, 
and Germanic and Balto-Slavic only use a syntagma with the free word *dkmt- 
(cf. Rau 2009 for an overview and discussion). Since the most original situation 
remains unclear, the significance of the Celtic-IIrn. agreement is unclear. 


14.4.10 Instrumental, Dative and Ablative Dual and Plural 


In endings of the instrumental, dative and ablative dual and plural, the PIIrn. 
set *-b^ya, *-b'is, *-b^yas corresponds more closely to the “southern” set *-5^oH, 
*-b'is, *-b'os attested from Armenian to Celtic, in contrast to “northern” endings 
with *-m° in Germanic and Balto-Slavic. Both sets are probably innovations, but 
the precise development still needs to be clarified (see Melchert & Oettinger 
2009; Kim 2013); in any case, the agreement with the southern group indicates 
closer contact, but differences in details favour an areal development rather than 
an inherited innovation from a common pre-stage. 


14.5 The Position of Indo-Iranian 


There can be no question that all Indo-Iranian languages are related to one 
another much more closely than to any other IE language, so Indo-Iranian is 
clearly defined as a primary branch of IE. The relationship of Indo-Iranian to 
other branches, however, is much less easy to describe. It has variously been 
grouped together with quite distinct branches in the history of IE linguistics. 


14.5.1 Different Trees 


Nearly all cladistic models assume Anatolian to have split off first (“Indo- 
Hittite" model) from PIE with the remaining branches becoming NIE, and most 
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also assume a second split-off of Tocharian vs. Inner IE (= Indo-Celtic, see 

Olander 2019) from NIE. Otherwise, they differ in many ways, as in the 

following overview, with the branches grouped according to how close they 

are to Indo-Iranian: 

* Schleicher's first trees (1860; 1861; 1862): 1. Graeco-Italo-Celtic, 2. 
Germanic-Baltic-Slavic 

* Gamkrelidze & Ivanov 1995: 1. Armenian, 2. Greek, 3. Germanic-Baltic- 
Slavic, 4. Italic-Celtic- Tocharian 

* Hamp 1990: 302: 1. Indo-Iranian = “Asiatic IE” vs. 2. “Residual IE" (all the 
rest including Tocharian) 

e Starostin 2004 (core lexicon only, glottochronology):'” 1. Balto-Slavic, 2. 
Germanic-Italic, 3. Armenian, Greek, Albanian 


Trees based on computational phylogenetic methods: 

* Ringe, Warnow & Taylor 2002 (mixed features; Germanic not classified): 1. 
Baltic-Slavic, 2. Greek, Armenian, 3. Italo-Celtic, 4. Albanian 

* Gray & Atkinson 2003; Bouckaert et al. 2012 (core lexicon only, problematic 
database, Bayesian): 1. Albanian, 2. Baltic-Slavic-Germanic-Italic-Celtic, 3. 
Greek-Armenian 

* Chang et al. 2015 (same database and method, different calibrations): 1. 
Baltic-Slavic-Germanic-Italic-Celtic, 2. Greek, Armenian, Albanian. !? 

Thus all neighbouring sub-branches except Tocharian have been assumed to 

be nearest to IIrn. In what follows, some important isoglosses are briefly 

discussed. 


14.5.2 Irrelevant Features: Shared Archaisms 


Many common features of Greek and Indo-Iranian are archaisms due to earlier 

attestation of these branches already in the second millennium vs. all other NIE 

branches. For example, preservation of: 

* perfect as a distinct category 

* original simple imperfect (vs. renewed marked formations in Tocharian, 
Armenian, Italic, Slavic) 

* subjunctive and optative (vs. loss of optative in Celtic, Armenian, of sub- 
junctive in Germanic, Baltic-Slavic) 

* vocabulary and poetic language. 

It is clear that such evidence is not relevant for subgrouping. 


12 Sergej Starostin, 2004, Handout, Workshop on the Chronology in Linguistics, Santa Fe. 

13 This is also the result of the most recent application of Bayesian methodology based on 
a strongly improved new database in Jena (IE-CoR, with my own participation). The best- 
supported tree configuration still shows Indo-Iranian nearer to a group comprising Balto-Slavic 
and Italic-Celtic-Germanic than to Greek, Armenian and Albanian, but all this with very low 
certainty. 
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14.5.3. Archaisms Shared with Anatolian (but not Greek) 


Some other archaisms are shared with Anatolian but not Greek. The clusters 
*tst etc. were preserved in PlIrn. (> IA. + Nur. (?) *tt, Irn. *st, as elsewhere in 
Eastern IE). Morphological archaisms are the middle 3sg. ending *-a(y) < *-ó(-) 
etc. and the active 3sg. ending -s (see Melchert 2015: 129-31; Kümmel 
2018a: 245-52; 2018b: 1912-14); maybe also the numeral *syá- ‘one’ 
(Kümmel 2016b) = Hittite sia-/sie- (but possibly also in Toch.B se, see 
Pinault 2006). 
Notably, the preservation of consonantal laryngeals seems to be better than 
anywhere else in NIE: 
* hiatus in Old Avestan and (less reliably) Vedic: e.g., subjunctive dat {daat} = 
dhat (d*aat) < *d'á(h)at 
* some laryngeals survived as some kind of *-h- internally after stops into 
Iranian, causing devoicing of preceding obstruents (Kümmel 2016a: 82-3; 
2018c: 164—5): 
*majh- > *mach- > *mac- > mas-/ma6- ‘great’ (vs. *majah- > mazä-) 
*dadh- > *dath- da0- ‘put’ (vs. *dadah- > daóà-); *nabh- > *naph- > näf- 
‘navel’ (vs. *nabah- > naba-) 
*wabh- > *waph- > *waf- ‘to weave’; *dahiwar- > dhaiwar- > *thaiwar- > 
*@aiwar- ‘brother-in-law’ 
h-/x- appears to be sporadically preserved in marginal Western Iranian 
(Kümmel 2016a: 83; 2018c: 166): e.g., MPers. xirs ‘bear’, xayag ‘egg’, 
xak ‘dust’; hes ‘ploughshare’, hésm/hémag ‘firewood’, hanzüg ‘narrow’; 
Parthian hand ‘blind’. Especially the cases with x- can hardly be assumed 
to show a “prothetic” consonant. A similar case can be made for the eastern 
margin (Khotanese h-, see Kümmel 2020: 246) 
loss after i/u was probably only post-Proto-Iranian, cf. the contrast between 
lengthening and non-lengthening in cases like *wihrá- > MPers. wir vs. 
Sogd. wir- ‘man’; *giywd- > MPers. ziw vs. *Ziwa- > Sogd. Zow- ‘alive’; 
*duhrá- > MPers. dur vs. Khot. dura- ‘far’ (see Kümmel 2018c: 166-9). 


14.5.4 Unique Archaisms = Shared or Parallel Innovation Elsewhere 


Indo-Iranian exhibits a few unique archaisms that contrast with innovations 
elsewhere. For example, the middle 3pl. ending *-rá(y) < *-ró(-) etc. which is 
not found anywhere else in the middle: all other branches including Anatolian 
generalized an ending containing *-nt-. However, since the other branches do 
not agree in detail, this cannot be used as an argument for an early separation of 
Indo-Iranian vs. the rest. Two other morphological archaisms are the perfect 
2pl. ending *-a « *-(H)e and the preservation of a distinct genitive vs. locative 
dual only in Iranian, while all other branches either lack one or both of these 
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categories or show syncretism.' In addition, there are numerous archaisms in 

the inflection of individual words and stems. 
Recapitulating the phylogenetic relations of the Indo-Iranian branch, we may 

conclude the following: 

* Indo-Iranian does not have a clear next relative. 

* It is rather distinct in some respects, so an early split seems quite possible 
(Hamp's scenario), but only under the assumption of continued areal contact. 

* There is good evidence for early proximity to Eastern Europe — with different 
developments shared with either the south (Greek, Albanian, Armenian) or 
the north (Baltic-Slavic, Germanic), or with the east (satem languages). 

* An original position at the eastern fringe of Europe is corroborated by 
contacts with both Western and Eastern Uralic. 
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15.1 Introduction 


Since the times of Bopp and Schleicher, Baltic and Slavic have been treated 
as a single branch of the Indo-European language family. Throughout the 
nineteenth century, this view remained unchallenged, and it is presented as 
received wisdom in Brugmann's Grundriss (1897: 20—1). At the beginning 
of the twentieth century, however, Meillet (1905: 201-2; 1922: 40-8) 
challenged the idea of a Balto-Slavic unity and argued that those similar- 
ities between Baltic and Slavic that are not archaisms inherited from 
(dialectal) Proto-Indo-European are due to parallel innovations. 
Throughout the twentieth century, the matter remained controversial. Balto- 
Slavic unity was defended by Rozwadowski (1912) and Vaillant (1950: 14), 
for example, while scholars like Senn (1941; 1970), Fraenkel (1950: 73— 
112), Pohl (1992), Schmid (1992) and Andersen (1996) remained sceptical 
and explained the similarities in terms of language contact and conver- 
gence. During the last quarter of a century, the communis opinio appears to 
have moved firmly in favour of the idea that there was indeed a period of 
shared innovations between Baltic and Slavic directly following the disin- 
tegration of the Proto-Indo-European parent language. As Olander (2015: 
24) aptly put it: “By tracing back the identical developments in the two 
branches to a common ancestor we obtain the simplest model of the 
relationship between Baltic and Slavic, without a notable loss of explana- 
tory power". 

Recent overviews of the shared Baltic and Slavic features that are relevant 
for the Balto-Slavic question can be found in Hock (2004, 2005), Euler (2007: 
10-15), Young (2017), Petit (2018) and Villanueva Svensson (in press). 
Excellent general overviews of the scholarly literature have been given by 
Petit (2004), Hock (2006) and Dini (2014: 200-13). This chapter will discuss 
the most compelling phonological, lexical and morphological evidence in 
favour of a Balto-Slavic clade, after which it will address dialectal variation 
within Proto-Balto-Slavic, the internal grouping of Balto-Slavic, external 
affiliations of Balto-Slavic and linguistic contacts of Proto-Balto-Slavic. 
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First, however, it is useful to take a brief look at Balto-Slavic from an archaeo- 
logical and palaeogenetic perspective. 

Anthony (2007: 348) associated Balto-Slavic (pre-Baltic and pre-Slavic) 
with the Middle Dnieper culture that lasted from approximately 2800—2600 
until 1900-1800 BCE. This is consistent with the linguistic evidence that the 
speakers of Balto-Slavic practised little agriculture (Pronk & Pronk-Tiethoff 
2018: 304—8). Together with the closely related Fatyanovo culture to its north- 
east, the Middle Dnieper culture covers the area in which Baltic- or Balto- 
Slavic-looking hydronyms are found (Gimbutas 1963: 91; Anthony 2007: 
380). Both these cultures belong to the larger Corded Ware horizon. 

The split between Baltic and Slavic must have taken place a long time after 
the split of Balto-Slavic from other Indo-European groups in view of the large 
number of Balto-Slavic innovations. A date much before the beginning of 
the second millennium BCE is therefore unlikely. This makes it questionable 
whether the people who introduced genes from the Pontic-Caspian Steppe into 
the Baltic region during the third millennium BCE (Mittnik et al. 2018) and the 
people of the Rzucewo or Bay Coast Culture of the same period were speakers 
of early Baltic (pace Rimantiené 1992). They might have been the ancestors of 
Balto-Slavic speakers, as suggested by Kortlandt (20182), in which case the 
idea that Balto-Slavic was still spoken on the Middle Dnieper during the third 
millennium BCE must be rejected. It seems more likely that the people who 
brought steppe genes into the Baltic region in the third millennium spoke 
another, now lost, dialect of Indo-European (cf. Kortlandt 20182). 

In the basin of the Dnieper river, the speakers of Balto-Slavic apparently 
picked up names for fish such as the wels catfish (Lith. Sämas, Ru. som), tench 
(Lith. /ynas, Ru. lin’), sturgeon (OPr. esketres, Ru. osétr) and perhaps ruffe 
(Lith. eZ(e)gys, Pol. jazdz, jazgarz).' The importance of rivers and fishing for 
the speakers of Balto-Slavic may also be reflected in the fact that Baltic and 
Slavic uniquely share verbs for wading (Lith. 3pres. brerida, Ru. 1sg.pres. 
bredu) and diving (Lith. nerti, RuCS vo-nreti), and nouns for spawning (Lith. 
nerstas, Ru. nérest), dugout canoe (Lith. eldija, OCS aldii) and raft (Latv. pluts, 
Ru. plot). The Baltic name for the pike (Lith. /ydys, OPr. liede), a fish that was 
an important food source in the Baltic area during the Neolithic (Rimantiene 
1992: 105), has no cognate in Slavic, but this could be due to a later 
replacement. 

From the middle Dnieper region, the ancestors of the speakers of West and 
East Baltic would have moved along the rivers into the forests to the north, 
where they borrowed words for woodland animals such as the elk (Lith. 
bríedis, Latv. briédis, OPr. braydis), woodpecker (Lith. genys, Latv. dzenis, 


! Because of the different vowels in the suffix, it seems likely that Lith. /asi$ and Ru. losés’ 
‘salmon’ were borrowed independently from similar sources, as was OHG /a/is ‘salmon’. 
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OPr. genix), hawk (Lith. vanagas, Latv. vanags, OPr. spergla-wanag 'sparrow- 
hawk’) and perhaps bear (Lith. /okys, Latv. lácis, OPr. clokis) from an unknown 
non-Indo-European language. Because there are very few shared innovations 
between Old Prussian and East Baltic (see Section 15.3.2), it would seem likely 
that they were spoken by different groups shortly after the migrations to the 
north and north-west from the Dnieper basin. Most if not all common East 
Baltic innovations, including the creation of new locatival cases due to contact 
with another, most probably Uralic language, could have taken place before the 
East Baltic languages entered the Baltic coastal areas. 

The speakers of pre-Proto-Slavic would originally have occupied the area 
between the Middle Dnieper and Upper Dniester (Anthony 2007: 379-80). 
Before their spread across Central and Eastern Europe after 500 CE, they can 
be most probably located to the north-east of the Carpathian mountains (Udolph 
1979: 619-23) and have often been associated with the Zarubintsy culture (appr. 
300 BCE-100 CE, see e.g. Maksimov in Rusanova & Symonovié 1993: 36-9). 

A study of the Y chromosome of Slavic populations supports the hypothesis 
that the Slavic expansion started from present-day Ukraine (Rebata et al. 2007). 
So far, no support for Proto-Balto-Slavic has been found in studies of DNA. 
Rebala et al. (2007) found significant differences in Y-chromosomal hap- 
logroup distribution between Slavic and Baltic populations. Baltic populations 
are genetically the closest to East Slavs, but this is probably due to a Baltic 
substrate in northern East Slavic (Kushniarevich et al. 2015). 


15.2 Evidence for the Balto-Slavic Branch 


15.2.1 Phonology and Relative Chronology 


In a 2005 article, Matasović (2005b) discussed the following eleven phono- 

logical innovations that are found in Baltic and Slavic: 
1. depalatalizations of palatovelars 

. satemization 

. the ruki rule 

. Hirt's Law? 

the development of syllabic resonants 

Lidén's Law’ 

loss of word-final *-d 

Winter's Law" 

*o>*a 


| 9 9 t RS 


? Le. a stress retraction onto a preceding syllable in which the nucleus was followed by a laryngeal. 

? Le. loss of word-initial *u- before *-r-, perhaps also before *-/-. 

* Le. lengthening ofa preceding vowel and introduction of acute intonation in a preceding syllable 
by what are traditionally reconstructed as voiced unaspirated stops. 
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10. deaspiration of the aspirated stops” 

11. loss of laryngeals.° 

Matasović concluded that these innovations could have occurred in the same 
chronological order and that no Baltic or Slavic innovation can be shown to 
have occurred before these innovations. The relative chronology of Balto- 
Slavic sound changes set up by Kortlandt (2011: 157—76; 2009: 43-6) leads 
to the same conclusion. The list of shared innovations can be extended by 
adding, e.g., the evolution of Baltic and Slavic mobile accentuation (Pedersen 
1933; Olander 2009, 2019; Jasanoff 2017; Kortlandt 2018b). The exact phon- 
etic conditions of some of the sound laws and their exact chronological order 
remain a matter of debate (cf. Hock 2006 with ample references to the relevant 
literature), but this does not affect the conclusion that Baltic and Slavic had 
a long shared history after Proto-Indo-European had dissolved. 


15.2.2 Shared Innovations in the Core Lexicon 


The existence of a unitary Balto-Slavic proto-language is confirmed by the fact 
that Baltic and Slavic share a number of lexemes belonging to the core vocabu- 
lary that are either not found in other Indo-European languages or that show 
identical morphological or semantic innovations compared to cognates in other 
Indo-European languages. The examples can easily be drawn from Trautmann's 
1923 dictionary or from Stawski 1970. The following seventeen etyma with 
a meaning that is usually thought to belong to the core vocabulary are exclusively 
Balto-Slavic: *put- ‘bird’, *kon?d- ‘to bite’, *skeit- ‘to count’, *touzk- ‘fat’, 
*nog- ‘foot, leg’, *ronka? ‘hand, arm’, *golzua? ‘head’, *rogos ‘horn’, *ledus 
‘ice’, *ke/ol- ‘knee’, *edsero ‘lake’, *uelk- ‘to pull’, *d3/guaizd- ‘star’, *soledus 
‘sweet’, *met- ‘to throw’, *bo/él2- ‘white’, *su(n) ‘with’. Based on the 1971 
Swadesh 100 list or the 2019 Jena 170 list (see www.eva.mpg.de/linguistic-and- 
cultural-evolution/research/ie-cor/) of core lexical meanings, this amounts to 
around 10 per cent of the total reconstructed Proto-Balto-Slavic basic lexicon. 


15.2.3. Shared Morphological Innovations 


There are numerous shared innovations between Baltic and Slavic in morph- 
ology. The following list is far from complete, but it contains those items that 


> Le. merger of what are traditionally reconstructed as mediae and mediae aspiratae. 

€ This should be changed into the merger of the laryngeals into a single segment, probably a glottal 
stop. The eventual loss of this segment occurred independently in Baltic and Slavic in view of 
OCS kamy ‘stone’ < *kaHmon, with metathesis from PIE *h»ekmón, but Lith. akmuö ‘stone’ 
without metathesis (although Matasović 2005b: 152 does not consider this evidence to be 
conclusive). On the dating of the loss of the laryngeals as segments in Balto-Slavic, see also 
Kortlandt 2009: 6. 
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are fairly indisputable. For these and other proposed shared innovations, the 

reader is referred to the literature cited in the introduction, especially Hock 

2005 and Villanueva Svensson in press, as well as Stang 1966: 18-20, Gotab 

1992: 50-1, and Kortlandt 2016c, 2018c. 

Shared innovated nominal endings: 

* o-stem gen.sg. *-à (Lith. -o, OCS -a, in OPr. -as enlarged with -s, see below) 
« PIE abl. *-oed 

* the generalized consonant stem gen.sg. *-es (OLith., OPr. -es, OCS -e) — 
PIE *-es, *-os 

e consonant stem instr.pl. *-mizs (Lith. -mis, OCS -mi) — PIE *-b^is(?) 

* adjectival o-stem neuter nom.acc.sg. *-o (Lith. -a, OPr. -a, OCS -o) « PIE 
pronominal *-od 

Shared innovations in nominal derivation: 

e deadjectival abstracts and nomina actionis in *-b- (Lith. -ba, -yba, -ybé, OCS 
-bba, zelobvo, zvloba ‘malice’, Arumaa 1955; probably from PIE *-b’h2- ‘to 
become") 

e deverbal abstracts in *-imo (Lith. -imas, OCS -omo, ultimately < PIE *-mn- 
(Pronk 2014)) 

* grammaticalization of the adverbial ending *-ai (Lith., OPr. -ai, OCS -é) < 
PIE loc.sg. *-oi 

Shared innovations in the morphology of the verbal system: 

e preterits/aorists in *-a (Lith. -o, OPr. -a, OCS aor. -a) 

* verbs with pres. *-ouzie/o-, pret. *-oud (Lith. pres. -auja, pret. -avo, OPr. 
3pres. -awie, OCS pres. -ujo, aor. -ova) 

e statives in *-e2- with an i-present (OPr. turit, turri ‘have’, Lith. budeti, büdi, 
ORu. bvdeti, bəèdim» ‘be awake") 

* perfects joining the preceding category (Lith. garéti, gäri ‘evaporate’, ORu. 
goréti, gorito ‘burn’) 

e transformation to a thematic present of PIE perf. *mog"- ‘be able’ 

e present stems *do2d- ‘give’ and *ded- ‘put’ (OLith. duosti, dest, OPr. dast, 
OCS dasto, -dezdo) — PIE pres. *di/e-deh3-, *d'i/e-d'eh;- 

e 2sg. pres. *esezi ‘you are’ + PIE */r;esi (Lith. esi, OPr. assai, OCS jesi) 
(Kortlandt 2009: 156) 

e causatives in *-(e)i- (Lith. báudinti, baudyti ‘urge’, OCS voz-buditi 
‘awaken’) — PIE *-eie- 

* oblique forms ofthe masculine and neuter present active participle in *-ont-ie/o- 
(e.g., gen.sg.m. Lith. nésancio, OCS nesosta ‘carrying’) 

e infinitives in *-izt(e)i with analogical *-2- after infinitives in *-e»t(e)i and 
*-at(e)i (Lith. -yti, -eti, -oti, OCS -iti, -éti, -ati) 

Further, there are some nouns in which Baltic and Slavic have (near) identical 

derivatives from Indo-European roots. In Trautmann's 1923 dictionary we find, 

inter alia, Lith. ävinas, OPr. awins, ORu. ovon» ‘ram’, Lith. artöjas, OPr. 
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artoys, OCZ. rataj ‘ploughman’, Lith. plaüciai, OPr. plauti, OCS plusta ‘lung- 
(s)’, Lith. dial. péntis, OPr. pentis, OCS peta ‘heel’. 


15.2.4 Shared Syntactic Innovations 


Due to the difficulty of reconstructing Proto-Indo-European syntax, it is also 
difficult to identify any syntactic innovations that Baltic and Slavic may have 
shared. In general, there are few methodological tools that we can use to 
determine whether any similarities in the structural properties of Baltic and 
Slavic are due to shared inheritance, shared innovation, independent innovation 
or mutual influence. Therefore, “the issue of Balto-Slavic ‘unity’ ... should 
center around phonology, morphology, and the lexicon" (Holvoet 2018: 2001). 

A seemingly shared Balto-Slavic syntactic feature is reflected in the definite 
adjectives that are attested in both branches, e.g. Lith. gerasis, OCS dobryi ‘good’. 
These definite adjectives derive from a nominal sentence in which a relative 
pronoun connects two nominal forms, agreeing in case, number and gender with 
the first of these nominals (Petit 2009). Parallels for such a construction are found 
in Iranian (Meillet 1922: 44). This syntactic construction “predat[es] at least the 
split between Balto-Slavic and Indo-Iranian" (Widmer et al. 2017: 811) and is 
likely to be an archaism inherited from PIE (Petit 2009: 354—5). The only plausible 
shared Balto-Slavic syntactic innovation reflected in the definite adjectives 1s the 
agreement between the relative pronoun and the head of the construction, which is 
also found in Iranian (Petit 2009: 354—5). 

The most promising example of a syntactic innovation that is shared by Baltic 
and Slavic only and less likely to have arisen independently or as a result of contact 
between Baltic and Slavic is the complete loss of the Proto-Indo-European middle 
voice and its replacement by reflexive verbs in at least some of its functions. See 
Holvoet 2020 for an extensive discussion of this issue. 


15.3 The Internal Structure of Balto-Slavic 


15.3.1 Proto-Balto-Slavic Dialectal Differentiation 


One might wonder whether any dialectal differentiation that might have been 
present in Proto-Balto-Slavic was carried over into Baltic and Slavic. 
According to Olander (2015: 24) “there are cases of variation that cannot be 
avoided in a reconstructed Balto-Slavic proto-language, such as the existence 
of different lexemes for the same notion, or the existence of variants with initial 
*a or *e in the same lexeme in different areas (Andersen 1996: 206 and 
passim)". Because the lexical data is open to various interpretations, I will 
here focus on the variants with initial *a or *e, such as Ru. oré/ but Lith. erélis 
‘eagle’ < PIE *hzer-I-. 
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Andersen proposed a scenario in which the variation arose within a Baltic- 
Slavic dialect continuum, even before some of the common Balto-Slavic 
innovations mentioned at the beginning of this chapter (1996: 106—7). The 
dialectal variants would have continued to coexist throughout the Proto-Slavic 
and Proto-East-Baltic periods and, in some cases, in the modern Slavic and 
Baltic languages. Such along period of coexisting variants ofthe same words is 
highly unlikely and not supported by the data. Instead, branch-internal mech- 
anisms caused the rise of the variation in initial vocalism. 

In Slavic, it has long been clear that the variation between initial je- (< *e-, 
*je- or *ja-) and o- (« *a-) cannot be separated from that between u- and ju- 
in OCS utro, jutro, or that between a- and ja- in OCS aviti, javiti, ORu. azv, 
jaz». The variation is due to sandhi variants that arose when a yod developed 
in hiatus between two vowels, one of which was a front vowel (Pedersen 
1905: 311). Similarly, words with an initial vowel developed a sandhi 
variant with initial *u- if they were preceded by a word ending in 
a rounded vowel, e.g. Cz. vejce ‘egg’ < *ajece. Some instances of initial 
je- are the result of the regular umlaut *ja- > *jä- > *je- and thus originally 
positional variants of *a- > *o-. The alternations between initial *uo-, *je- 
and *o- and between *e- and *je- in sandhi led to the generalization of one of 
the variants, and sometimes to the analogical introduction of an etymologic- 
ally “incorrect” onset, e.g. in the word for ‘wasp’, which is *osa in almost all 
of Slavic, but vosa in Czech. The Czech form is the older variant in view of 
outer-Slavic cognates such as Lith. vapsva and Lat. vespa. The variant *osa 
must be due to reinterpretation of *vosa as a sandhi variant after rounded 
vowels (Pedersen 1905: 312). 

There is no reason to assume that the Baltic variation between initial a- and 
e- and the Slavic alternation between initial o- and je- are in any way related 
(see further Derksen 2002; Kortlandt 2011: 255-8). They therefore provide no 
evidence for a Balto-Slavic dialect continuum, nor for a shared innovation. 

The strongest potential evidence for inner-Balto-Slavic variation that I am 
aware of is the 1sg. personal pronoun */;eg, that underwent Winter’s Law 
(> Proto-Balto-Slavic *e2d3) and produced ORu. ja. In Baltic, the same 
pronoun has a voiceless sibilant and a short vowel: OLith. es, Latv. es, OPr. 
as, es. The Baltic forms seem to suggest that there was a positional variant 
*h ek before a following word beginning with a voiceless consonant that did 
not undergo Winter's Law. If this is correct, Slavic and Baltic may have 
generalized different sandhi variants. The generalization of one of the variants 
could of course have happened at any point after Winter's Law, and not 
necessarily before the dissolution of Proto-Balto-Slavic. Other explanations 
are also conceivable. Kortlandt (20132), for example, argued that the Baltic 
forms and Slavic *ja are the result of post-Proto-Balto-Slavic shortenings of 
original *erdzun, preserved in Slavic as *(j)azb (e.g. ORu. jaz»). In either 
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scenario, there is no compelling evidence for internal differentiation within 
Proto-Balto-Slavic that was carried over into Baltic or Slavic. 


15.3.2. Internal Grouping 


Traditionally, Balto-Slavic has been divided into Baltic and Slavic, with 

a further split between West and East Baltic after a period of common Baltic 

innovations. The separate status of Slavic is evident, but the existence of 

a period of common Baltic innovations is more difficult to demonstrate; see 

most recently Villanueva Svensson 2014, Hill 2016 and Kortlandt 2018c with 

references to the older literature. Stang (1966: 2-10) lists the similarities 
between the Baltic languages that set them apart from all other Indo- 

European languages, including Slavic (notation as in the original): 

* complete merger of the 3sg. and 3pl. verbal endings 

* two preterit classes in *-e and *-a 

* a distribution between the 3rd person verbal endings *-ti to monosyllabic 
stems and *-t > zero to polysyllabic stems 

e Isg. athematic *-mái 

* a thematic vowel -a- « *-o-, never *-e- 

* nominal e-stems 

* intrusive *k before consonant clusters beginning with *s 

* nomina actionis with the suffix *-sian-, perhaps also *-sen- 

e nouns in *-Unas 

* diminutive suffixes *-eliia-, *-uz-, *-ut-, *-ait- (also in patronymics) 

* adjectives in *-ing- 

* identical compound names, often with a binding vowel *-i- 

e d-presents to verbs in *-iti 

e sta-presents to middle/intransitive verbs 

* causatives In *-ina- 

* a large amount of uniquely shared lexicon, including identical derivatives 
from inherited roots and semantic innovations in inherited material (cf. Petit 
2010: 10-11). 

To these we can add the loss of *-j- between a consonant and a front 

vowel (Villanueva Svensson 2014: 165) and the identical restructuring of 

some Proto-Indo-European consonant stems and root nouns: Lith. akis, 

OPr. ackis ‘eye’, Lith. ausis, OPr. acc.pl. ausins ‘ear’ (Hill 2016: 210-11), 

Lith. saule, OPr. saule ‘sun’, Lith. gerve, OPr. gerwe ‘crane’, Lith. zeme, 

OPr. semme ‘earth’, Lith. diena, OPr. acc.sg. deinan ‘day’. Other proposed 

shared innovations, such as the change of *-iia to *-& (Petit 2010: 6; 

Villanueva Svensson 2014: 165; cf. also Hill 2016) and the shortening of 

unstressed *-; « *-eie- (Hill 2016: 214—22; Villanueva Svensson 2019), 

remain the subject of debate. 
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In the former case, if there was a raising of *-iid to *-iié, it may well have 
been shared by Slavic, cf. the type OCS mlenii (f.) ‘lightning’ < *-iie. This 
leaves the contraction and associated metatony as potentially shared Baltic 
innovations, but consider the general preservation of *a after yod in other 
positions (e.g. Lith. jóti ‘to ride’, bijoti ‘to fear’, valia ‘will’ etc.) and further 
objections raised by Kortlandt (2018c). The alleged change of *-iià to *-é thus 
remains poorly understood and cannot serve as evidence for the branching of 
Balto-Slavic. 

Most evidence for Hill's contraction of unstressed *-r- < *-eie- is judged to 
be inconclusive by Villanueva Svensson (2019), except for the PIE i-stem dat. 
sg. ending *-eiei, for which the common Baltic evidence would be the ti-stem 
dative *-ti « *-teiei (Skt. -taye) that was grammaticalized as an infinitive 
(Lith. -ti, Latv., OPr. -/). We are thus dealing with a sound law that explains 
only a single morpheme, which weakens it considerably. Moreover, the Baltic 
infinitive ending *-ti has a potential counterpart in Slavic. Next to the well- 
known Slavic infinitive ending *-ti, there is a widespread variant *-te, which 
could go back to Balto-Slavic *-ti. There cannot have been a general reduction 
of unstressed *-i to *-» in Slavic, because nominal endings in -i, e.g. several 
forms of the i-stems, nom.pl. -i in the o-stems, instr.pl. -mi etc. are never 
reduced (cf. Vaillant 1950: 219-20). This means that the shortening in the 
infinitive of unstressed *-t7> *-ti > *-to, if that is indeed how the Slavic variants 
arose, only affected the specific pre-Proto-Slavic sequence that produced -i in 
the infinitive and perhaps in the athematic imperative, cf. OCS dazdb ‘give!’ < 
2sg. optative *-ieh ;-s(?). However, it did not affect the dat.sg. ending -i of the i- 
and u-stems, which was also unstressed. In short: the Baltic infinitive ending 
*-ti has a potential parallel in Slavic, so the alleged shortening of an alleged 
Proto-Balto-Slavic infinitive ending *-tī cannot be used as evidence for a Proto- 
Baltic stage. 

Many of the shared features of West and East Baltic can be and have been 
argued to be either inherited from Proto-Balto-Slavic and lost in Slavic or 
independent innovations, most prominently by Kortlandt (2018c with refer- 
ences to earlier works). In order to demonstrate that there was indeed 
a period of shared Baltic innovations, the innovated feature must not only 
be shared by West and East Baltic, it must also be shown to have never 
existed in Slavic, and its introduction should not be a trivial development. 
Few of the shared features collected by Stang and others fulfil these criteria. 
The shared derivational suffixes on Stang’s list could all have been lost in 
Slavic. The same is true for lexical items such as Lith. turéti, Latv. turét, 
OPr. turritwei ‘to have’ and Lith. gimti, Latv. dzimt ‘to be born’, OPr. 
gemmons ‘born’. The semantic innovation in Lith. giria, Latv. dzira ‘forest’, 
OPr. garian ‘tree’ versus OCS gora ‘mountain’ turns out to be trivial if one 
takes a closer look at the semantics of the Slavic cognates, cf. Bulg. gora 
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and Slk. hora ‘forest’. The word appears to have designated a wooded slope 

or mountain in Proto-Slavic and Proto-Balto-Slavic. The semantic innov- 

ation in Lith. médis ‘tree’, Latv. mezs, OPr. median ‘forest’? versus OCS 
mezda ‘boundary’ is also trivial, cf. SIn. dial. mej ‘forest’ from the same root 
and the connection between Lith. vidis ‘middle’ and Old English widu 

‘wood’. 

The most robust evidence for a Proto-Baltic period is, in my view, presented 
by the productivity of nominal e-stems (whatever their origin), the (near) 
merger of 3sg. and 3pl. verbal forms, the loss of *-j- between a consonant 
and a front vowel and the identical evolution of a number of former consonant 
stems and root nouns. This seems to suggest that there was indeed a Proto- 
Baltic period, which lasted for at least a few generations but probably no longer 
than a few centuries. 

It has long been clear that West and East Baltic are also separated by some 
isoglosses that connect East Baltic with Slavic. The most often cited examples 
are the following (see Villanueva Svensson in press for a few more inconclu- 
sive examples): 

* the o-stem gen.sg. ending (Lith. -o, OCS -a < PIE abl.sg. *-oed versus OPr. 
-dS) 

e the initial consonant in the word for ‘nine’ (Lith. devyni, OCS devete versus 
OPr. newints *ninth") 

* the word for ‘third’ (Lith. trécias, OCS tretii versus OPr. tirts, tirtis) 

* presence versus absence of -s- in the dat.sg. and loc.sg. of the demonstrative 
pronoun (Lith. tamui, tame, tái, toje, OCS tomu, tomo, toi versus OPr. stesmu, 
stessei). 

It is, however, uncertain that these isoglosses are the result of shared innov- 

ations of only East Baltic and Slavic. In the first three cases, East Baltic and 

Slavic may preserve the Proto-Balto-Slavic situation, and in the fourth case 

they may have innovated independently. 

The Prussian o-stem gen.sg. ending -as has been explained from PIE 
*-oso, *-osio, *-os, as analogical to the feminine a-stem ending -as (Leskien 
1876: 31-3), or from the same *-oed as East Baltic with addition of the 
genitive singular marker *-s (Vaillant 1958: 30; see further Rinkevičius 2015: 
106—7 with literature). The latter explanation seems to be the least problematic 
phonetically, and it has been suggested that traces of an earlier s-less ending 
-d, -u may exist within Old Prussian (Leskien 1876: 33-4; Girdenis & Rosinas 
1977: 3; Kortlandt 2009: 192). There is therefore no demonstrably old dis- 
tinction between West and East Baltic in this ending. 

The introduction of d- in *nine' (see above) is due to anticipation of the d- of 
‘ten’ when counting. It is plausible that it first affected the cardinal and then 
spread to the ordinal numbers. For Proto-Balto-Slavic, one may then 
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reconstruct *deuin ‘nine’, *neuintas ‘ninth’, with preservation of the latter in 
OPr. newints.’ 

It is possible that East Baltic and Slavic shared the replacement of *tirtiios 
‘third’, reflected in OPr. firts, tirtis, by *tretiios. It is, however, equally con- 
ceivable that the Prussian word was influenced by *ketuirtas ‘fourth’ after the 
dissolution of Proto-Balto-Slavic (Mažiulis 2013: 912). It would then replace 
earlier *tretiios, which is itself best understood as a replacement of an even 
older *tritiios, cf. Av. Sritiia-, Lat. tertius, Goth. pridja < *tri-t(H)-iHo-, on the 
basis of *treies ‘three’. If that is the case, the resemblance between OPr. tirtis 
and Skt. trfiya- ‘third’ is coincidental. 

The analogical removal of -s- in the pronominal dat.sg. and loc.sg. Lith. 
tamui, tame, tái, toje and OCS tomu, tomb, toi was an innovation in contrast to 
its preservation in OPr. stesmu, stessei ‘that’, cf. Skt. tasmai, tasmin, tásyai, 
tasyam. " The replacement was part of the general loss of the distinction 
between the direct and oblique cases in the pronoun, cf. OPr. dat.sg.f. tennei 
‘her’ — *tenness(i)ei after nom.sg. tennd ‘she’, but preservation of dat.sg.f. 
stessiei to nom.sg.f. stai. It is conceivable that the removal of -s- occurred 
independently in East Baltic and Slavic, as in OPr. tennei. The removal of -s- 
was ultimately the result of the elimination of the suppletive nominatives 
m. *sa and f. *saH, which probably took place after the dissolution of Proto- 
Balto-Slavic as well (Kortlandt 2009: 139). 

It seems most likely that, after the dissolution of Proto-Balto-Slavic, West 
and East Baltic remained a single unit for a relatively short period. There may 
have been a few shared innovations between East Baltic and Slavic during this 
same period, although the evidence is not very robust. If this is indeed the case, 
however, the dissolution of Balto-Slavic could be seen as a gradual process 
with increasing dialectal differences, “with East Baltic as an intermediate 
dialect between West Baltic and Slavic" (Kortlandt 2018c: 176). 


15.4 The Relationship of Balto-Slavic to the Other Branches 


15.4.1 Genealogical Relations 


The perpetual question as to whether there was a period of shared Balto-Slavic 
and Germanic innovations is probably to be answered in the negative. The key 
argument has always been the *-m- of the dat. and instr.du.pl. endings in Balto- 
Slavic (pl. OLith. -mus, -mis, OCS -m», -mi) and the dat.pl. in Germanic (Goth., 


7 Tt cannot be ruled out either that the n- of Old Prussian is due to German influence (Derksen 
2015: 126). 

* Hill's (2016: 224-7) explanation ofthe loss of *-s- as phonetic in unstressed position before *-m- 
is unconvincing. This highly specific and phonetically problematic sound law is set up to explain 
the single morpheme *tosm-. It does not account for the feminine forms. 
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OHG -m) that contrast with *-b’- in the instr.pl. in Greek (-gz) and Armenian 
(-b), dat. and instr.du.pl. in Indo-Iranian (pl. Skt. -bhyas, -bhis) and dat.pl. in 
Italo-Celtic (Lat. -bus, Olr. -b). Because *-b’- is most clearly at home in the PIE 
instrumental plural ending, and *-m- cannot have arisen out of thin air, it is 
likely that the Germanic and Balto-Slavic dative plural endings are archaic 
(Hirt 1895; Beekes 2011: 188). In other words, the Core Indo-European ending 
contained an *-m-, which was replaced by *-b- from the instrumental in Latin 
and Indo-Iranian, while in Slavic instrumental *-b’- itself was replaced by *-m- 
from the dative (see Olander 2015: 269—70 for alternative views). It is clear that 
a common innovation of the dat.pl. ending in Germanic and Balto-Slavic 
cannot be substantiated. There are no other common innovations in the nominal 
declension (Leskien 1876), nor are there any shared phonological innovations. 
Parallel syntactic structures, such as the absolute dative or the genitive of 
negation, cannot be used as evidence because they can represent (partial) 
archaisms or reflect parallel innovations. Any evidence for a period of shared 
Germano-Balto-Slavic innovations must thus come from the lexicon, nominal 
derivation or verbal inflection. 

A significant part ofthe vocabulary that is shared exclusively by Germanic and 
Balto-Slavic, collected and discussed by Stang (1972) and Nepokupnyj et al. 
(1989), consists of words belonging to semantic fields that are prone to borrow- 
ing, such as flora and fauna. Some of the correspondences from semantic fields 
other than flora and fauna could easily be archaisms inherited from Proto-Indo- 
European, e.g. Goth. ju, Lith. jañ, OCS (j)u-ze ‘already’ < PIE *hzieu; ON lyór, 
Lith. lidudis, OCS ludije ‘people’ < PIE *A;leud'-i-; ON ljóór, OCz. l'ud 
‘people’ < PIE *h,leud'-o-; MLG noster(en) ‘nostril’, Lith. nasrai ‘snout’, 
OCS nozdri ‘nostrils’ < PIE *nh>-(e)s-r-, ON surr ‘sour, bitter’, Latv. sürs 
‘salty, bitter’, OCS syra ‘damp’ < PIE *suH-ro-; OPr. tüsimtons, OCS tysesti, 
Got. pusundi ‘thousand’ < PIE *tuHs-dkmt-. The remaining shared vocabulary 
does not contain any obvious replacements of Proto-Indo-European basic 
vocabulary and is not numerous enough to warrant the reconstruction of 
a period of joint Germanic and Balto-Slavic innovations. 

A morphological argument often adduced in favour of a Germano-Balto- 
Slavic node is the shared adjectival suffix *-isko-, Goth. -isks, Lith. -iskas, OCS 
-bsko, which primarily indicates origin from a particular place (Kluge 1926: 
104). The suffix may have been created by adding adjectival *-ko- to local 
adverbs in *-is of the type Skt. bahih ‘outside’, avih ‘manifestly’. If there was 
no Germano-Balto-Slavic node, the suffix must have arisen in a small number 
of forms in Proto-Indo-European and have become productive independently 
in Germanic and Balto-Slavic but have been lost elsewhere. This is conceiv- 
able. Vaillant's (1958: 682) idea that the Slavic suffix was borrowed from 
Germanic and the Baltic one from Slavic seems unlikely, especially in view 
of Lithuanian -3-. 
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Another innovation perhaps shared between Balto-Slavic and Germanic is 
found in the semantics of nasal presents (Villanueva-Svensson 2011 with 
references). It has long been recognized that nasal presents in these languages 
are predominantly intransitive and have inchoative or fientive semantics, e.g. 
Goth. ga-waknan ‘to wake up’, Lith. uz-migti, -minga ‘to fall asleep’, OCS 
vez-bonoti ‘to wake up’. In other branches, nasal presents typically form 
causatives, factitives and intensives (see Meiser 1993 with references), but 
cf. Lat. -cumbo ‘lie down’. In Greek, Indo-Iranian, Tocharian and Anatolian, 
nasal presents are mostly transitive in the active form, though not exclusively, 
cf., e.g., Gr. iva ‘to decline, decay’. Some nasal presents in Balto-Slavic, on 
the other hand, are transitive, e.g. Lith. gauna ‘to obtain’ and OCS tokno ‘to 
stab’. 

The question as to whether the semantics of those Germanic and Balto- 
Slavic nasal presents that are inchoatives or fientives reflect a shared innovation 
depends on the reconstruction of the (pre-)Proto-Indo-European function of the 
nasal verbal suffix. Old Indo-European nasal presents are typically formed to 
roots with telic semantics. The nasal present appears to signify change of state 
(rather than “starkes Betroffensein", Meiser 1993: 295) of the object of 
a transitive verb (cf. PIE *ui-n-d- ‘find’) or the subject of an intransitive 
(unaccusative) verb. In addition, it is relevant that the suffix became a present 
marker and is never found in the aorist or perfect. This means that the oldest 
layer of nasal presents must have had progressive or ingressive semantics. They 
would thus have described the process of a change of state of either subject or 
object. Whether the nasal presents ended up as factitives and causatives or 
inchoatives and fientives depended on whether they were derived from 
a transitive or intransitive base. It has been argued that the intransitive 
Germanic and Balto-Slavic nasal presents derive from intransitive thematic 
aorists (Stang 1966: 340 for Balto-Slavic), middle root aorists (Kortlandt 2010: 
219-20 for Germanic) or from the middle of the nasal present (Villanueva- 
Svensson 2011: 43; Kroonen 2012: 270 n. 11 for Germanic; cf. also Meiser 
1993: 291—3). At least in Baltic, some nasal presents were derived from 
perfects: Lith. kafika ‘hang’, randa ‘find’, tampa ‘become’, pranta ‘acquire 
a habit or inclination’ (Stang 1966: 313, 315). The productivity of transitive or 
intransitive nasal presents, or indeed the lack of them, could be taken as 
a potential shared innovation of some branches of Indo-European, but it is 
a rather trivial development as long as it is assumed that both types existed in 
Proto-Indo-European. As an argument for a Balto-Slavo-Germanic node, the 
semantics of the nasal present are not particularly forceful. 

A closer relationship between Balto-Slavic and any of the other branches is 
difficult to demonstrate as well. According to Kortlandt (201 6a), “[t]he closest 
relatives of Balto-Slavic are Albanian and Indo-Iranian”, but shared innov- 
ations are few. Potentially shared phonological innovations are satemization, 
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which is also shared with Armenian, and the ruki rule, which possibly affected 
Armenian as well. In both cases, the shared innovation would have been the 
initial phonetic development, because the phonemicization of the rules is 
branch specific. Because phonetic changes can be reversed, it is impossible to 
show that none of the other branches took part in the initial, phonetic stages of 
satemization or the ruki rule as well. Consider in this respect the alleged satem 
reflexes in Luwic (Melchert 2012 with literature) and the Hieroglyphic Luwian 
sign sa3, which occurs mainly in the vicinity of the ruki sounds (Rieken 2010). 

Kortlandt (2018d: 287) proposed that the loss of a laryngeal between two 
vowels was a shared innovation of Balto-Slavic and Indo-Iranian. Laryngeals 
were also lost in this position in all other branches of Indo-European except 
Anatolian. In Greek, this loss produced a disyllabic sequence, but in Indo- 
Iranian and Balto-Slavic the result is a monosyllabic long vowel. In Indo- 
Iranian, laryngeals were also lost if the second vowel was *i or *u, producing 
a monosyllabic diphthong (Lubotsky 1995). In Balto-Slavic, the laryngeals 
were initially retained before *i and *u and eventually produced acute accen- 
tuation. The loss of intervocalic laryngeals was therefore an independent 
innovation in Balto-Slavic and Indo-Iranian. 

Grammatical features shared by Indo-Iranian and Balto-Slavic are all archa- 
isms (cf. Kortlandt 20162). Kortlandt adduces the acc.sg. *;mém (Skt. mam, 
OCS me) for older *h,me (Gr. éué) ‘me’ as a shared innovation, but this is 
incorrect. Skt. mam is sometimes disyllabic, which is best explained by assum- 
ing that it reflects PIE *h,me with the Indo-Iranian suffix *-Ham of Skt. aham 
‘I’, tuyvam ‘you’ etc.’ OCS me, OPr. mien on the other hand, reflect *h;me to 
which the acc.sg. ending *-m has been added (Olander 2015: 122-3). 

The list of shared lexemes provided by Porzig (1954: 164—9) is too short to 
suggest a closer connection between Indo-Iranian and Balto-Slavic. It includes 
Skt. krsna-, Lith. kizsnas, OCS cron ‘black’ < *krs-no-; Skt. tucchya-, Lith. 
tuscias, OCS testo ‘empty’ < *tusk-io-; Av. spanta-, Lith. Sventas, OCS sveto 
‘holy’ < *Kuen-to- (possibly with Skt. sund- ‘success’, Hitt. kunna- ‘right, 
favourable’, Duchesne-Guillemin 1947). These are best explained as inherited 
from PIE. The suffix *-no- in Skt. daksina-, Lith. desinas, OCS desn» ‘right’ < 
*deks-(i-)no- may also be an archaism because the suffixes that we find in the 
other branches, Gr. óecióc < *deks-i-uo-, Goth. taihswa and Olr. dess ‘right’ < 
*deks-uo-, appear to have been taken over from PIE */A;ei-uo- ‘left’. The lack 
of medial *-i- in the Slavic form is not easily explained as an innovation. Lith. 
desinas and the Indo-Iranian forms may have been influenced by a lost adverb 
*deks-i, which is often assumed to have existed (Beekes 1994: 90; Stüber 
2006). 


? Lowe this observation to Martin Kümmel. 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


15 Balto-Slavic 283 


The discussion above leads to the conclusion that there are hardly any facts 
that can be better explained if it is assumed that Balto-Slavic was itself part of 
a larger subgroup of Indo-European. 


15.4.2 Linguistic Contacts of Balto-Slavic and the Depalatalization 
of Palatovelars 


Although much is known about the linguistic contacts of West Baltic, East 
Baltic and Slavic when these were already separate branches, language contact 
dating back to the Balto-Slavic period is more difficult to establish. The part of 
the Balto-Slavic lexicon that was not derived from inherited. Proto-Indo- 
European material must have been borrowed from unknown contact languages, 
but these languages are elusive. Many, if not all, non-Indo-European lexemes 
that can be reconstructed for Proto-Balto-Slavic also have reflexes in other 
branches of Indo-European, which Matasović (2013: 98) attributes to a lack of 
direct contact between Balto-Slavic and non-Indo-European languages. The 
borrowings would have entered Balto-Slavic via an Indo-European intermedi- 
ate. The main problem of this scenario is that the loanwords in question cannot 
have been borrowed directly from a known Indo-European language, for 
phonological reasons. At least one of the contact languages must have been 
an otherwise lost branch of Indo-European, perhaps the Temematic language 
argued for by Holzer (1989), cf. the discussion in Matasović 2013: 77-81, 
Kortlandt 2016b: 84 and Holzer 2018. More than one contact language is 
perhaps required, for example because the sound changes that would charac- 
terize Temematic, if real, are found only in part of the borrowed vocabulary. 
Kortlandt (2018a) argued for another Indo-European contact language, 
Venedic, “which contained an older non-Indo-European layer and was part of 
the Corded Ware horizon." 

There have been attempts to explain certain phonological peculiarities of 
Balto-Slavic as being due to language contact, but these have not been very 
successful. This can be illustrated by the so-called centum reflexes of the Indo- 
European palatovelars, the first development on Matasovic’s list cited in 
Section 15.2.1. See Hock 2004: 11 for a survey of the relevant literature. 

The Indo-European palatovelars *k, *g and *g" are in most cases reflected as 
sibilants in Baltic and Slavic, but both branches also have cases in which the 
palatovelars became velar occlusives. A detailed study of these cases reveals 
that the velar reflexes can in no way be regarded as being due to language 
contact, but must be due to a regular development in certain environments 
(Meillet 1894; Kortlandt 2009: 27-32; 2013b; Matasović 2005a). This is 
a priori an attractive scenario, because the words in question look like inherited 
Baltic and Slavic words in all other respects: there is no other phonetic or 
morphological reason to think that they might be loanwords and they do not 
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belong to a part of the lexicon that typically contains loanwords (Cekman 

1974: 130-1). Moreover, there is a distribution with regard to the environ- 

ment in which the velar reflexes are found: they virtually only occur when the 

following syllable contains a resonant or the semivowel *-u-. This suggests 

that the velar reflex was regular before these sounds, in some cases with the 

additional condition that a back vowel must follow. The original distribution 

was somewhat obscured by the fact that quite a number of roots regularly 

obtained variants with sibilant and velar reflexes, depending on the ablaut 

grade. This variation was generally removed by analogy, unless there was 

a semantic and/or morphological difference between the variants. Consider 

the following examples of cognate words, which have both sibilant and velar 

reflexes: 

e OCS zelen, Ru. zelényj ‘green’, Lith. dial. Zeitas ‘greenish’, Latv. zelts 
‘gold’ < *gtel- 

* Lith. žãlias ‘green’, OCS zlato ‘gold’ < *g'ol- 

* Ru. žėltyj, Slk. zity ‘yellow’ < *g"l- 

Lith. geltas ‘yellow’ is a contamination of Proto-Balto-Slavic *dzelt-, cf. Latv. 

zelts, and *gilt-, cf. Ru. Zéltyj. It is of course arbitrary to assume that Lith. geltas 

is a borrowing and that all the other forms are inherited. 

e Lith. Zárdas ‘rack for drying flax’, Ru. zoród, ozoród ‘haystack’ < *gMord-o- 

e OCS Zredb, Ru. Zerd' ‘pole’ < *g™rd-i- 

e Lith. dial. s/dvé ‘honour’, OCS slovo ‘word’, slysati ‘to hear’ < *kleu-, *klüs- 

* Lith. Klausyti ‘to listen’ < *klous- 

Baltic preserves both kl- and s/-, while Slavic generalized s/-: OCS slušati ‘to 

listen’, slava ‘fame’ < *klous-, *klöu-. Again, assuming that Lith. klausyti is 

a borrowing is extremely unlikely, if only from a semantic point of view. 

e Lith. Slieti, dial. Slinti ‘to lean’ < *Klei-, *Klin- 

* OCS kloniti se ‘to bow’, Ru. klonit’ ‘to incline’ € *klon-. 

With *s/- we find deverbal CS sloniti se ‘to lean’, a causative-iterative to *kli-n-. 

There is no reason to separate Slavic *klon- and *slon- (ESSJ 10: 67). PSI. 

*kloniti is probably a denominative to *klon» ‘inclination’, an o-stem derived 

from *kli-n- ‘to lean’ (cf. YAv. -srinaomi). 

e Lith. Sviésti, dial. svitéti, OCS svoteti se ‘to shine’ < *kueit-, *kuit- 

* OCS cvéte, Cz. květ ‘flower’ < *kuoit- 

OCS svéte, Ru. svet ‘light’ < *kuoit- is a younger deverbal derivative, while the 

initial consonant of OCS cvisti ‘to bloom’ is analogical after the noun ‘flower’. 

Latv. kvitét ‘to glimmer’ is identical to Lith. dial. Sviteti, OCS svoteti se, but has 

analogical k-. 

e Lith. sesuras ‘father-in-law’ < *suekur- 

* OCS svekry ‘mother-in-law’ < *suekru- 
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PSI. *svekro ‘father-in-law’ (not *svekore, cf. ORu. svekr» and the accent of 
Ru. svékor, Serb., Cr. svékar instead of Tsvekór, Tsvékar) is based on *svekry 
(Derksen 2008: 475). 

* Lith. akmuö, OCS kamy ‘stone’ < *hzekmd(n), -mon- 

Cf. Skt. asman- ‘stone’; the Slavic forms show metathesis *H...k>*k... 
H after *K > *k. Lith. ásmenys ‘blade’, which is often considered to be a closely 
related form with a sibilant reflex of the palatovelar, is much more likely to be 
an inner-Baltic or Balto-Slavic men-stem derived from the root of astrüs 
‘sharp’, like many other post-Proto-Indo-European men-stems in Baltic (cf. 
Skardžius 1943: 293-4). 

Within a single paradigm, the alternations caused by the depalatalization of 
palatovelars have not been preserved in the daughter languages; either the velar 
or the sibilant was generalized (see further Kortlandt 2013b): 

* Lith. Sirdis, OCS sredece ‘heart’ < *Kr -, cf. Lith. serdis ‘core, kernel’, OCS 
sreda ‘middle’ < *kerd-, OPr. seyr < *ker(d) 

e Lith. karvé, OCS krava ‘cow’ < *korh>-u-, cf. OPr. curwis ‘ox’ < *krh;-u- 

* Lith. pékus, OPr. pecku ‘cattle’, with *-k- from the oblique cases, cf. Skt. gen. 
sg. pasváh « *pek-u-os. 

e OCS zrono, OPr. syrne ‘grain’, Lith. Zirnis ‘pea’ < *érh;-n-, cf. OHG kerno 
‘kernel’ < *gerhz>-n-. 

e Lith. astrus, OCS ostro ‘sharp’ < *h „ek-ro-, with *-K- reintroduced from the 
comparative stem *h ek-i(e)s- and/or from derivatives, cf. OCS osla ‘whet- 
stone’, ostond ‘sharp point’, oste ‘thistle’. 

Depalatalization of palatovelars must have occurred in several stages, 
with e.g. depalatalization before *r already in Proto-Indo-European (Kortlandt 
2013b), but the important point with respect to the Balto-Slavic question is 
that no uniquely Slavic or Baltic change can be shown to have preceded it 
and that it is not a contact phenomenon. Explanations of the centum reflexes 
in Balto-Slavic that operate with unverifiable prehistoric dialectal differ- 
ences or large-scale diffusion from other branches of Indo-European, e.g. in 
the form of secondary satemization of Balto-Slavic (thus Mottausch 2006) 
or contact with otherwise unattested Indo-European substrata (thus Andersen 
2003: 53-8, 66), simply fail to explain the distribution of the velar reflexes. 

We can conclude that our present knowledge of the linguistic contacts of 
Proto-Balto-Slavic is very limited and confined to evidence from the lexicon. 


15.5 The Position of Balto-Slavic 


All linguistic evidence points to a Balto-Slavic proto-language that must have 
existed for a significant period after the disintegration of Proto-Indo-European. 
All shared innovations could have taken place before the first detectable 
isoglosses between Baltic and Slavic. Explanations for the data that do not 
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depart from a single Balto-Slavic proto-language (e.g. Holzer 2001; Andersen 
2003) are unnecessarily complicated and involve additional unfalsifiable 
dimensions such as shifting prehistoric dialects or otherwise unattested contact 
languages. The uniformity of this proto-language has often been questioned, as 
the following two quotations by Petit testify: 


Si le balto-slave a existé, ce n'est sürement pas comme une langue totalement unifiée, 
mais plutót comme un groupe de dialectes perméables à la diffusion d'isoglosses. 


[If Balto-Slavic has existed, it is surely not as a totally unified language, but rather as 
a group of dialects susceptible to the diffusion of isoglosses.] (Petit 2004: 35) 


No scholar would today seriously reconstruct a proto-language as free of internal 
variation as Schleicher did for Indo-European, and no scholar, not even the staunchest 
supporters of a proto-language common to Baltic and Slavic, would dare to write a tale 
in Balto-Slavic. (Petit 2018: 1971) 


I disagree with both statements. Proto-Balto-Slavic — the stage right before the 
first isoglosses between the three branches arose — may have been dialectally 
diversified, but this diversity cannot be reconstructed (see Section 15.3.1). 
There may have been a “Common Balto-Slavic" period, during which innov- 
ations could have affected different subsets of predecessor dialects to West 
Baltic, East Baltic and Slavic, but the evidence for such a period is limited to the 
handful of innovations potentially shared by East Baltic and Slavic (see 
Section jsp ^ In fact, the linguistic data do not rule out a scenario in 
which Proto-Balto-Slavic was a dialect or sociolect that was spoken by 
a relatively small group of people and that any related dialects or sociolects 
disappeared without leaving a trace. Because there is at present no compelling 
positive evidence in favour of internal variation in Proto-Balto-Slavic, we 
should indeed try to reconstruct a monolithic proto-language that contains the 
ancestors of all Baltic and Slavic forms and structures that are inherited from 
Proto-Indo-European as well as the results of the shared innovations of Baltic 
and Slavic. Villanueva Svensson (in press) rightly remarks that the reconstruc- 
tion of such a proto-language “can be seen as a powerful heuristic device." 
Although it is of course not to be expected that we will ever be able to write 
a story in Balto-Slavic as well as a speaker of that language would have done, 
trying to do so would be a very useful way of demonstrating the gaps in our 
knowledge of Proto-Balto-Slavic (see Kortlandt 2010: 49 for an attempt to 
render Schleicher's fable in Proto-Balto-Slavic). 

If we take away the innovations that characterize Baltic and Slavic as 
individual branches, we are left with a language that is both phonologically 


10 Petit’s example of OPr. irmo ‘arm’ versus OCS ramo ‘shoulder’ can be explained from a Proto-Balto 
-Slavic ablauting mn-stem (Pronk 2014). 
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and morphologically still quite close to reconstructed Proto-Indo-European. If 
the Balto-Slavic proto-language is associated with the (earlier phases of the) 
Middle Dnieper culture, which seems reasonable, the split between Baltic and 
Slavic can be dated no later than the beginning of the second millennium BCE. 
The period of shared innovations would then have been up to 1,500 years, 
which does not seem to be too short or too long for the number of innovations 
that must have taken place. After the split, Baltic and Slavic developed 
independently for over two millennia, which accounts for some of the striking 
differences between Baltic and Slavic that prompted Meillet to doubt the 
existence of a shared proto-language in the first place (Rozwadowski 1912: 
17—18, 33). This is also the period during which speakers of Baltic and Slavic 
shifted to a more agriculture-based mode of subsistence, as is shown by their 
distinct agricultural terminology (Pronk & Pronk-Tiethoff 2018). West and 
East Baltic remained in each other's vicinity for a longer time, which would 
explain how they borrowed the same words for certain woodland animals, as 
mentioned above. Eventually, Baltic and Slavic came into contact again as 
speakers of Slavic started to move north in the early Middle Ages. 

If we go further back in time, we can detect traces of contact between 
Proto-Balto-Slavic and one or more other languages that appear to be other- 
wise unknown to us. During the third millennium BCE, Proto-Balto-Slavic 
would have been spoken by people of the Middle Dnieper culture (see 
Section 15.1). Balto-Slavic was not part of a larger subgroup of Indo- 
European. There is insufficient support in the data for a prolonged period in 
which Proto-Balto-Slavic shared innovations with either Germanic or Indo- 
Iranian (see Section 15.4.1). This suggests that soon after the dissolution of 
Proto-Indo-European, the speakers of Proto-Balto-Slavic no longer regularly 
communicated with the speakers of the ancestors of these other branches, 
which is best explained by assuming that they had become geographically 
separated from each other. 
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Albanian; see also Balkan languages 
and Armenian; see Armenian and Albanian 
and Balto-Slavic, 226-8, 281 
and Balto-Slavic (language contact); see 
language contact, Albanian and Balto- 
Slavic 
and Celtic; see Celtic and Albanian 
and Germanic; see Germanic and 
Albanian 
and Greek; see Greek and Albanian 
and Greek (language contact); see language 
contact, Greek and Albanian 
and Indo-Iranian, 230-1 
and Italian (language contact); see 
language contact, Italian and Albanian 
and Italic; see Italic and Albanian 
and Latin (language contact); see language 
contact, Albanian and Latin 
and Messapic; see Messapic and Albanian 
alteuropáisch, 161 
analogy; see innovations, morphological 
Anatolian 
and Celtic, 149 
and Germanic, 166 
and Indo-Iranian, 263 
and Italic, 131 
and Tocharian, 11, 88 
and western subgroups, 76—7 
Anatolian hypothesis, 37, 60, 61 
archaisms, shared, 20, 21—6, 66, 102—3, 158, 
262-3, 280 
Armenian; see also Balkan languages 
and Albanian, 12, 216, 228-9 
and Germanic; see Germanic and Armenian 
and Greek; see Greek and Armenian 
and Phrygian; see Phrygian and Armenian 
augment, 4, 197, 209, 217, 225 


backmutation, 53, 57, 58 

Balkan languages, 12, 14, 163, 189, 192, 193, 
209, 216-17, 228, 233, 237-8, 239, 241; 
see also Albanian; Armenian; Greek; 


Macedonian, Ancient; Messapic; 
Phrygian 
and Germanic; see Germanic and Balkan 
languages 
and Indo-Iranian, 217 
and Tocharian; see Tocharian and Balkan 
languages 
Baltic 
and Slavic, 44, 45, 196, 269-70, 274-9, 286-7 
and Uralic (language contact); see 
language contact, Baltic and Uralic 
Balto-Slavic 
and Albanian; see Albanian and Balto-Slavic 
and Germanic; see Germanic and Balto- 
Slavic 
and Indo-Iranian; see Indo-Iranian and 
Balto-Slavic 
Bartholomae's Law 
Indo-European, 44, 246—7 
Indo-Iranian, 44, 246-7, 257 
Bayesian inference, 1, 3, 7, 36, 37, 38, 55, 59, 
60—1, 262; see also computational methods 
biology, evolutionary, 33-4, 36, 37, 59 
bootstrap values, 60-1 
Brugmann's Law (Indo-Iranian), 44, 164, 
248-9 


Celtic 
and Albanian, 229 
and Anatolian; see Anatolian and Celtic 
and Germanic, 11, 12, 149, 161-3 
and Greek, 178 
and Tocharian; see Tocharian and Celtic 
centum vs. satem, 136, 161, 192, 196, 209, 217, 
228-9, 238-40, 257-9, 264, 271, 281-2, 
283-5 
cognacy databases, 7, 36-8, 59, 272; see also 
Swadesh lists 
common language, definition, 8 
compensatory lengthening 
Albanian, 227, 232, 234, 235 
Doric, 188-9 


293 


Downloaded from https://www.cambridge.org/core. IP address: 171.113.31.237, on 16 Sep 2022 at 12:23:20, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://www.cambridge.org/core/product/4B44B5ACFOD3BBA89B9408050F112A52 


294 Index 


compensatory lengthening (cont.) 
Eastern Ionic, 185 
Goidelic, 144 
Greek, 187, 232 
Indo-Iranian, 248 
Indo-Tocharian, 137 
Ionic-Attic, 185 
computational methods, 1-3, 4, 9, 10, 14-15, 
22, 23-4, 26, 33-48, 52-61, 109, 159, 
225, 241, 262; see also Bayesian inference 
contact; see language contact 
convergence, 26-9, 116-18, 122, 161, 162, 163, 
165, 182, 184—5, 189, 190—1, 223, 269 
Cop's Law (Anatolian), 68 
Core (Proto-)Indo-European; see Indo- 
Tocharian 
Cowgill's Law (Greek), 177; see also relative 
chronology 


dating, 2, 37-8, 58-61 
Proto-Anatolian, 75-6 
Proto-Balto-Slavic, 270-1 
Proto-Greek, 189 
Proto-Indo-European, 2, 37-8, 59-61, 90 
Proto-Indo-Tocharian, 90 
Proto-Luwic, 70-1 
Proto-Tocharian, 87—8 

deaspiration 
Balto-Slavic, 272 
Nuristani, 254 
Tocharian, 92 

depalatalisation 
Balto-Slavic, 271, 283-5 
Indo-Iranian, 252 
Nuristani, 254 

dialect continuum, 26-9, 147, 159—60, 

184, 275 
dialects in proto-languages, 7, 26-9, 
274-6, 286 

DNA; see genetics 

Dybo’s Shortening 
Celtic, 137 
Germanic, 137 
Italic, 137, 210 


Fortunatov’s Law (Indic), 250 
Francis’ Law (Greek), 210 


genetics, 38, 61, 122, 270, 271 
Germanic 
and Albanian, 163, 229-30 
and Albanian (language contact); see 
language contact, Germanic and 
Albanian 
and Anatolian; see Anatolian and Germanic 


and Armenian, 165-6 
and Balkan languages, 163 
and Balto-Slavic, 12-13, 163-5, 279-81 
and Celtic; see Celtic and Germanic 
and Greek, 163, 178 
and Illyrian; see Illyrian and Germanic 
and Italic; see Italic and Germanic 
and Latin (language contact); see language 
contact, Germanic and Latin 
and Messapic; see Messapic and Germanic 
and Thracian; see Thracian and Germanic 
and Tocharian; see Tocharian and Germanic 
and Venetic; see Venetic and Germanic 
glottochronology, 33-4, 35, 262 
Graeco-Armenian; see Greek and Armenian 
Graeco-Aryan; see Greek and Indo-Iranian 
grammaticalisation, 24, 25, 155 
Baltic, 277 
Balto-Slavic, 273 
Greek, 180 
Greek and Armenian, 212 
Italic, 127 
Luwic, 68 
North Germanic, 157 
Tocharian, 88 
Grassmann's Law 
Armenian, 248 
Greek, 247 
Indo-Iranian, 247-8 
Latin, 116, 248 
Macedonian, Ancient, 191 
Tocharian, 92, 248 
Greek; see also Balkan languages 
and Albanian, 12, 193, 225-6, 231-7, 
241 
and Albanian (language contact); see lan- 
guage contact, Albanian and Greek 
and Ancient Macedonian; see Macedonian, 
Ancient, and Greek 
and Armenian, 12, 14, 45, 193-6, 209-16, 
225-6 
and Celtic; see Celtic and Greek 
and Germanic; see Germanic and Greek 
and Indo-Iranian, 12, 178, 196-7, 260, 262 
and Italic; see Italic and Greek 
and Messapic; see Messapic and Greek 
and Phrygian; see Phrygian and Greek 
and Venetic; see Venetic and Greek 


Hirt's Law (Balto-Slavic), 271 
Holtzmann's Law (Germanic), 154 
homeland 
definition, 7-8 
Indo-European, 2, 4, 27, 37, 60, 61; see also 
steppe hypothesis; Anatolian hypothesis 
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Illyrian, 161 
and Albanian, 241 
and Germanic, 163 
Indic 
and Iranian, 44, 45, 196, 251—7 
and Nuristani, 251—7 
Indo-Anatolian; see Indo-European 
Indo-Anatolian hypothesis, 1—2, 5, 11, 13, 66, 
77-9, 89-90, 168, 261-2 
Indo-Aryan; see Indic 
Indo-Hittite hypothesis; see Indo-Anatolian 
hypothesis 
Indo-Iranian 
and Albanian; see Albanian and Indo-Iranian 
and Anatolian; see Anatolian and Indo- 
Iranian 
and Balkan languages; see Balkan languages 
and Indo-Iranian 
and Balto-Slavic, 12-13, 281-2 
and Greek; see Greek and Indo-Iranian 
and Italo-Celtic; see Italo-Celtic and Indo- 
Iranian 
and Uralic (language contact); see language 
contact, Indo-Iranian and Uralic 
Indo-Tocharian (Core Indo-European, Nuclear 
Indo-European, non-Anatolian Indo- 
European), 66, 77-9, 89-90, 122, 197, 
232, 239, 251, 280 
Indo-Tocharian hypothesis, 89-97, 168, 
261-2 
innovations, shared, 21—6, 225-6 
lexical, 22, 23, 54, 91 
morphological, 23, 25, 43, 45-7, 54, 
56 
phonological, 22, 54 
syntactic, 23-4 
weighting, 25 
intermediate proto-language, definition, 
8-9 
Iranian 
and Indic; see Indic and Iranian 
and Nuristani, 251—7 
and Tocharian (language contact); see 
language contact, Tocharian and Iranian 
Italian 
and Albanian (language contact); see 
language contact, Albanian and Italian 
Italic 
and Albanian, 230 
and Anatolian; see Anatolian and Italic 
and Celtic, 11, 44, 102-8, 131, 149 
and Germanic, 12, 130, 161—2 
and Greek, 130-1, 178 
and Slavic, 131 
Italo-Celtic; see also Italic and Celtic 
and Indo-Iranian, 108—9, 111 
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Joseph's Law (Celtic), 139-40 

Joseph's Law, expanded (Brittonic and 
Gaulish), 147 

Junggrammatiker; see neogrammarians 


Kluge's Law (Germanic), 153 


language contact, 20, 24; see also substrate 
Albanian, 223, 225 
Albanian and Balto-Slavic, 227 
Albanian and Greek, 225 
Albanian and Italian, 225 
Albanian and Latin, 227 
Baltic and Uralic, 271 
Balto-Slavic, 283—5 
Brittonic and Gallo-Romance, 145 
Germanic, 37 
Germanic and Albanian, 227 
Germanic and Latin, 154 
Indo-Iranian, 12, 258, 264 
Indo-Iranian and Uralic, 264 
Italo-Celtic, 110 
Tocharian, 84 
Tocharian and Iranian, 84—5, 87, 88 
Tocharian and Samoyedic, 84 
Tocharian and Uralic, 84, 87, 91 
laryngeal metathesis 
Indo-European, 93 
laryngeals, loss 
Armenian, 204 
Balto-Slavic, 272, 282 
Celtic, 137-8 
Greek, 175, 177 
Indo-Iranian, 248, 282 
Italic, 118 
Latin 
and Albanian (language contact); see lan- 
guage contact, Albanian and Latin 
and Germanic (language contact); see lan- 
guage contact, Germanic and Latin 
lengthening; see compensatory lengthening 
lenition 
Anatolian, 66, 67 
Armenian, 205 
Brittonic, 146 
Celtic, 142 
Greek, 175, 177 
Old Irish, 144 
Phrygian and Greek, 192 
Sabellic, 124 
lexical clock, 59 
lexical data; see cognacy databases 
lexicostatistics, 33—4, 35, 36 
Lidén's Law (Balto-Slavic), 271 
Lindeman's Law, 118 
Lusitanian, 22, 111, 135 
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Macedonian, Ancient, 12, 193; see also Balkan 
languages 
and Greek, 12, 191 
and Phrygian, 191 
and Thracian, 191 
maximum compatibility, 1, 53 
maximum parsimony, 53 
merger, phonological, 22, 54, 56 
Messapic, 10, 14; see also Balkan languages 
and Albanian, 240-1, 249 
and Germanic, 163 
and Greek, 193 
methodology, 9, 10, 13, 14-15, 18-29, 33-48, 
52—61, 90—1; see also Bayesian inference; 
computational methods; glottochronology; 
lexicostatistics 


neogrammarians, 19—21, 22, 26-7 
network, 28, 57-8 
non-Anatolian (Proto-)Indo-European; see 
Indo-Tocharian 
Nuclear (Proto-)Indo-European; see Indo- 
Tocharian 
Nuristani 
and Indic; see Indic and Nuristani 
and Iranian; see Iranian and Nuristani 


outgroup analysis, 52-3, 57 


Palaeo-Balkanic; see Balkan languages 
paradigmatisation 
Germanic, 154—5 
parallel development, 45, 53, 54, 57, 58, 
90 
para-proto-language, definition, 8 
perfect phylogeny, 37, 57 
Phrygian, 10, 12, 14, 22, 109, 237; see also 
Balkan languages 
and Ancient Macedonian; see Macedonian, 
Ancient, and Phrygian 
and Armenian, 12, 191, 209, 212, 216, 217 
and Greek, 12, 14, 177, 178, 190, 191-3, 
2]2. 216, 217 
Pictish, 135 
Pre-Greek; see substrate in Greek 
Pre-Samnite, 106, 115, 118, 125 
preterite-presents (Germanic), 154, 167-8 
productivity, sustained 
Germanic, 168 
prothetic vowels 
Greek, 188 
Greek and Armenian, 204 
Greek and Phrygian, 192 
Greek, Phrygian and Armenian, 192, 217 
Proto-Indo-European, definition, 8 


proto-language, definition, 7-8 
pruning, 28-9 


quantitative methods; see computational 
methods 


Rask/Grimm's Law (Germanic), 153 
relative chronology, 23, 24—5, 29, 34, 38-41, 
43-7, 58; see also dating 
Armenian, 203 
Balto-Slavic, 44, 271—2 
Celtic, 136 
Germanic, 54-6, 153 
Greek, 175-6, 182 
Greek and Armenian, 45 
Indo-Iranian, 44 
Italic, 24—5 
Italo-Celtic, 44, 103-4, 108 
Tocharian, 84—5 
retentions; see archaisms, shared 
rooting of tree, 56 
ruki change, 27 
Albanian, 258 
Armenian, 206, 258, 282 
Balto-Slavic, 258, 271 
Indo-Iranian, 249—50, 258 
Luwian, 258-9, 282 


Samoyedic 
and Tocharian (language contact); see lan- 
guage contact, Tocharian and Samoyedic 
satem vs. centum; see centum vs. satem 
Sicel, 115-16, 118 
Slavic 
and Baltic; see Baltic and Slavic 
and Italic; see Italic and Slavic 
speech community, 8, 18, 24, 26-9 
Stang's Law (Indo-European), 121 
steppe hypothesis, 37, 60, 61 
subgrouping, history of, 18-21 
substrate; see also language contact 
in Albanian, 240 
in Armenian, 209 
in Balto-Slavic, 283-5 
in Greek, 180-1 
in Greek and Armenian, 214-15 
in Indo-Iranian, 249 
in northwestern languages, 110 
in Tocharian, 84, 91 
Swadesh lists, 36, 127-8, 272; see also cognacy 
databases 


Tartessian, 135 
thorn clusters (Indo-European), 108, 136, 177, 
232, 239-40, 252 
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Thracian 
and Ancient Macedonian; see Macedonian, 
Ancient, and Thracian 
and Germanic, 163 
Thurneysen's Law (Italic), 104 
Thurneysen-Havet's Law (Italic), 119 
time depth; see dating 
Tocharian 
and Anatolian; see Anatolian and 
Tocharian 
and Balkan languages, 216 
and Celtic, 149 
and Germanic, 166 
and Uralic (language contact); see language 
contact, Tocharian and Uralic 


Uralic 
and Baltic (language contact); see language 
contact, Baltic and Uralic 
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and Indo-Iranian (language contact); see 
language contact, Indo-Iranian and 
Uralic 

and Tocharian (language contact); see lan- 
guage contact, Tocharian and Uralic 


Venetic, 11, 22, 106, 115, 116, 117, 118, 119, 
120, 121, 122,123, 124, 127, 128, 161 
and Germanic, 130, 162, 165 
and Greek, 193 
Verner's Law (Germanic), 55, 153, 156 
Verschärfung, 158, 159—60 


Wackernagel's Law, 149 
wave model, 20, 26, 28-9, 159—60, 240-1 
weighting; see innovations, weighting 
Winter's Law 

Albanian, 227, 235 

Balto-Slavic, 164, 227, 271, 275-6 
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